MODAClouds D6.1 AnalysisOfTheStateOfTheArtAndDefiningTheScope

Grant Agreement N FP7-318484
Copyright 2012 by the MODAClouds consortium All rights reserved.

The research leading to these results has received funding from the European Community's Seventh Framework
Programme [FP7/2007-2013] under grant agreement n 318484 (MODAClouds).

Title: Analysis of the state of the art and defining the scope
Authors: Danilo Ardagna (POLIMI), Giuliano Casale (IMPERIAL), Ciprian
Craciun (IEAT), Michele Ciavotta (POLIMI), Emanuele Della Valle
(POLIMI), Elisabetta Di Nitto (POLIMI), Mozhdeh Gholibeigi
(POLIMI), Peter Matthews (CA), Marco Miglierina (POLIMI), Juan F.
Prez (IMPERIAL), Cosmin Septimiu Nechifor (SIEMENS), Craig
Sheridan (FLEXI), Weikun Wang (IMPERIAL)
Editor: Victor Munts (CA)
Reviewers: Francesco dAndria (ATOS) and Franck Chauvel (SINTEF)
Identifier: Deliverable # D6.1
Nature: Report
Version: 1
Date: March 29
th
, 2013
Status: Final
Diss. level: Public
Executive Summary

This deliverable presents a state of the art analysis of cloud monitoring techniques, the tools available for
managing QoS in the cloud, and the techniques available for automatically deploying applications on the cloud.
Furthermore, performance models which can be used to quickly estimate cloud systems runtime performance and
resource management techniques for optimizing the use of cloud resources from the software provider perspective.
This deliverable also specifies the requirements for the MODAClouds run-time environment developed within
WP3 and defines a road-map for the work package.

MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1

Public Final version 1.0, March 29
th
2013 2
Members of the MODAClouds consortium:

Politecnico di Milano
Italy
Stiftelsen Sintef
Norway
Institute E-Austria Timisoara
Romania
Imperial College of Science, Technology and Medicine
United Kingdom
SOFTEAM
France
Siemens Program and System Engineering
Romania
BOC Information Systems GMBH
Austria
Flexiant Limited
United Kingdom
ATOS Spain S.A.
Spain
CA Technologies Development Spain S.A.
Spain

Published MODAClouds documents
These documents are all available from the project website located at http://www.modaclouds.eu/

MODAClouds

Public Final Version 1.0, March 29
th
2013 3

Contents

!" #$%&'()*%#'$ """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" ,
"#"# $%&'()' *&+ %,-($'./(0 ####################################################################################################################################################### 1
"#2# 3%+*$4%5+0 65&7'.3( (&/.6%&3(&' /.0.%& ############################################################################################################# 8
!"#"!" $%&'( &*+ ,-./*0.&' 1((23450%*( """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 6
!"#"#" 7189:; <%%4 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" =
!"#">" ?%*.-452&' 1@./05-.52@- """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" !A
"#9# %/(6/.(: ################################################################################################################################################################################ "2
-" .%/%01'21%301/&%4 &)$1%#50 67/%2'&5. """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" !8
2#"# ;6(*3,4( ################################################################################################################################################################################ "9
#"!"!" ?@05-@0& """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" !B
2#2# +(;4%<3(&' *;;6%*$=(0 ################################################################################################################################################## ">
#"#"!" 8&&C (%'250%*( """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" !D
#"#"#" E-4'%F3-*5 &*+ -G-.250%* (%'250%*( """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" #B
#"#">" E-4'%F3-*5 (%'250%*( """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" #6
#"#"B" H8I 4@%J-.5( """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" >!
#"#"D" K*L@&(5@2.52@-( """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" >B
2#9# 6(0%56$( *44%$*'.%&? 4%*+7,*4*&$.&@ ######################################################################################################################## 9>
#">"!" <%&+ M&'&*.0*N &*+ (./-+2'0*N &'N%@05/3( """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" >D
#">"#" <%&+ M&'&*.0*N 0* 4@%+2.50%* (F(5-3( """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" >O
2#A# *;;4.$*'.%& +*'* 3*&*@(3(&' *&+ 3.@6*'.%& ######################################################################################################### 9B
8" .%/%01'21%301/&%4 *7')( 5'$#%'&#$9 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" :!
9#"# ;6(*3,4( ################################################################################################################################################################################ A"
9#2# @(&(6*4 3%&.'%6.&@ *;;6%*$=(0 ################################################################################################################################# A2
>"#"!" ,/- ?P74%*-*5 8-@L%@3&*.- 1((2@&*.- C%'250%*( Q?P781CR S7%(A#TU """""""""""""""""""""""""""""""""""""""""""""" B#
>"#"#" ,-(59VW S7-FABTU """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" B#
>"#">" 8-@L%@3&*.- 1*50:4&55-@* E-5-.50%* Q81ER S8&@A6 TU """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" B>
>"#"B" 1* -'&(50. 72'50:'&F-@ 3%*05%@0*N &44@%&./ S;%*!#TU """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" BB
>"#"D" X7?7Y 1 X2*503- 7%+-' W&(-+ 7%*05%@0*N 144@%&./ L%@ ?'%2+ SC/&!ATU """"""""""""""""""""""""""""""""""""""""""" BD
>"#"O" ,/- 72'50:'&F-@ ?%''-.50%* &*+ ?%*(5@&0*5 <&*N2&N- Q3'??<R SW&@!#T """"""""""""""""""""""""""""""""""""""""""""""""" BO
>"#"I" K*L@&(5@2.52@- 3%*05%@0*N 0* .2@@-*5 K%, '&*+(.&4- """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" BO
9#9# .&C6*0'65$'56(74(/(4 3%&.'%6.&@ ############################################################################################################################## A1
>">"!" $2-(5 Z7 3%*05%@0*N """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" BI
>">"#" [%(5 3&./0*- 3%*05%@0*N """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" BI
>">">" ?'%2+ K*L@&(5@2.52@-:'-\-' 3%*05%@0*N """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" B6
>">"B" ?'%2+:(4-.0L0. 3%*05%@0*N """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" B6
9#A# *;;4.$*'.%&74(/(4 3%&.'%6.&@ ###################################################################################################################################### AB
>"B"!" 1 X2*:503- ?%@@-'&50%* 9*N0*- QX,?9:M&(-+ &44@%&./R S[%'!AT """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""" B=
>"B"#" ?1CZ0EY &* C]78:M&(-+ 3%*05%@0*N &44@%&./ L%@ C<1 \0%'&50%* S93-!#TU """"""""""""""""""""""""""""""""""""""""" DA
>"B">" 1 32'50:'&F-@ &44@%&./ L%@ .'%2+ &44'0.&50%* 3%*05%@0*N S$%*!!T """"""""""""""""""""""""""""""""""""""""""""""""""""""""" D!
>"B"B" ?'%2+ 144'0.&50%* 7%*05%@0*NU 5/- 3PC1K? 144@%&./ SX&^!!T """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" D!
>"B"D" 7B?'%2+Y 1 $-*-@0. 144'0.&50%* <-\-' 7%*05%@0*N S7&(!!T """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" D#
>"B"O" X97PY & X-(%2@.-:1_&@- 144'0.&50%* C5&5- 7%*05%@0*N &44@%&./ S7-*A6T """""""""""""""""""""""""""""""""""""" D>
>"B"I" ?'%2+BCP1 """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" DB
9#># $%0' 3%&.'%6.&@ *&+ 3(*056(3(&' .&+()(0 ############################################################################################################ >>
>"D"!" `*0L0-+ C-@\0.- E-(.@0450%* <&*N2&N- """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" DD
>"D"#" C-@\0.- 7-&(2@-3-*5 K*+-G """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" DD
:" .%/%01'21%301/&%4 ;'. 5/$/9050$% """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" <=
A#"# ;6(*3,4( ################################################################################################################################################################################ >B
A#2# D%0 +*'* *&*4<0.0 *&+ C%6($*0'.&@ ############################################################################################################################ >B
MODAClouds

th
2013 4
B"#"!" 8@%M'-3 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" D=
B"#"#" P\-@\0-_ %L +&5& &*&'F(0(Y L%@-.&(50*N 5-./*0a2-(Y &*+ a2-2-0*N 3%+-'( """""""""""""""""""""""""""""""""""""""""""" OA
B"#">" C5&50(50.&' K*L-@-*.- L%@ b%C 3%+-' 4&@&3-5-@0c&50%* """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" O!
B"#"B" d%@^'%&+ L%@-.&(50*N 3-5/%+( """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" O>
A#9# 65&7'.3( D%0 3%+(40 ####################################################################################################################################################### EA
B">"!" C5&50(50.&' '-&@*0*N 3%+-'( """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" OD
B">"#" ?%*5@%' 5/-%@F 3%+-'( """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" OO
B">">" 8@%+2.5:L%@3 a2-2-0*N *-5_%@^( """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" OI
B">"B" <&F-@-+ a2-2-0*N *-5_%@^( """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" OI
B">"D" C233&@F """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" O6
A#A# 04* 3*&*@(3(&' ################################################################################################################################################################# EB
B"B"!" 8@%M'-3 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" O=
B"B"#" C%'250%* """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" IA
B"B">" E0(.04'0*- """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" IA
B"B"B" C5&5- %L 5/- &@5 """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" I!
B"B"D" C233&@F 5&M'-( """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" IB
B"B"O" ?@05-@0& L%@ -\&'2&50%* """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" II
<" 5'(/*7')(. &)$1%#50 67/%2'&5 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" ,>
>#"# %/(6/.(: ################################################################################################################################################################################ 18
D"!"!" 1.5%@( """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" I6
D"!"#" X-a20@-3-*5 C-5( """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" I6
D"!">" X-a20@-3-*5 9'0.05&50%* 7-5/%+%'%NF """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" I=
>#2# ()($5'.%& 6(D5.6(3(&'0 ################################################################################################################################################## 8F
D"#"!" ?%*5-G5 &*+ CF(5-3 P\-@\0-_ """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 6A
D"#"#" `(- .&(- (4-.0L0.&50%* L%@ 5/- X2* 144'0.&50%* 2(- .&(- """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 6A
D"#">" `(- .&(- (4-.0L0.&50%* L%@ 5/- E-4'%F 144'0.&50%* 2(- .&(- """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 6!
D"#"B" `(- .&(- (4-.0L0.&50%* L%@ 5/- C5&@5eC5%4 144'0.&50%* C&*+M%G 2(- .&(- """"""""""""""""""""""""""""""""""""""""""""""" 6#
>#9# 3%&.'%6.&@ 6(D5.6(3(&'0 ############################################################################################################################################### 8A
D">"!" ?%*5-G5 &*+ CF(5-3 P\-@\0-_ """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 6B
D">"#" `(- .&(- (4-.0L0.&50%* L%@ 5/- K*(5&'' 7%*05%@0*N X2'- 2(- .&(- """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 6B
D">">" `(- ?&(- C4-.0L0.&50%* L%@ 5/- 1.50\&5-eE-&.50\&5- 7%*05%@0*N X2'- 2(- .&(- """""""""""""""""""""""""""""""""""" 6D
D">"B" `(- .&(- (4-.0L0.&50%* L%@ 5/- 1++eX-3%\- PM(-@\-@ 2(- .&(- """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 6O
D">"D" `(- .&(- (4-.0L0.&50%* L%@ 5/- ?%''-.5 7%*05%@0*N E&5& 2(- .&(- """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 6I
D">"O" `(- .&(- (4-.0L0.&50%* L%@ 5/- E0(5@0M25- E&5& 2(- .&(- """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 6=
>#A# *&*4<0.0 %C 6(D5.6(3(&'0 ################################################################################################################################################ 8B
D"B"!" ?%*5-G5 &*+ CF(5-3 P\-@\0-_ """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 6=
D"B"#" `(- .&(- (4-.0L0.&50%* L%@ 5/- ?%@@-'&5- 7%*05%@0*N E&5& 2(- .&(- """""""""""""""""""""""""""""""""""""""""""""""""""""""""" =!
D"B">" `(- .&(- (4-.0L0.&50%* L%@ 5/- 9(503&5- 7-&(2@- 2(- .&(- """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" =#
D"B"B" `(- .&(- (4-.0L0.&50%* L%@ 5/- H%@-.&(5 7-&(2@- 2(- .&(- """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" =B
D"B"D" `(- .&(- (4-.0L0.&50%* L%@ 5/- H--+M&.^ 7-&(2@- 2(- .&(- """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" =D
>#># 0(4C7*+*;'./.'< 6(D5.6(3(&'0 ###################################################################################################################################### BE
D"D"!" ?%*5-G5 &*+ CF(5-3 P\-@\0-_ """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" =O
D"D"#" `(- .&(- (4-.0L0.&50%* L%@ 5/- E-L0*-e`*+-L0*- b%C ?%*(5@&0*5( 2(- .&(- """"""""""""""""""""""""""""""""""""""""""""" =O
D"D">" `(- .&(- (4-.0L0.&50%* L%@ 5/- C5&@5eC5%4 H--+M&.^ %L C-'L:1+&450\05F E&5& 2(- .&(- """""""""""""""""""""" =I
D"D"B" `(- .&(- (4-.0L0.&50%* L%@ 5/- E-L0*-e`*+-L0*- ?%(5 ?%*(5@&0*5( 2(- .&(- """""""""""""""""""""""""""""""""""""""""""" =6
>#E# 6%*+3*; ################################################################################################################################################################################## BB
&020&0$*0. """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" !??
/660$(#@ / A &)$1%#50 67/%2'&5 0B/7)/%#'$ *&#%0&#/ """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" !!!

MODAClouds

th
2013 5

Acronyms

API Application Programmable Interface
ARIMA Autoregressive Integrated Moving Average
ARMA Autoregressive Moving Average
AWS Amazon Web Service
CDN Content Delivery Network
CEP Complex Event Processing
CLI Command Line Interface
CUI Console User Interface
DSS Decision Support System
ECA Event Condition Action
EJB Enterprise Java Beans
FA First Available
GAE Google App Engine
HTTP Hypertext Transfer Protocol
IaaS Infrastructure as a Service
IDE Integrated Development Environment
IoT Internet of Things
IP Internet Protocol
JDO Java Data Objects
JMX Java Management Extensions
JNI Java Native Interface
JSON JavaScript Object Notation
JSQ Join-the-shortest-queue
LC Least-Connections scheduling
LLC Locality-based Least-Connections scheduling
LLCR Locality-based Least-Connections scheduling with replications
LVS Linux Virtual Server
MAPE-K Monitor Analyse Plan Enforce Shared Knowledge
OS Operating System
OVF Open Virtualization Format
PaaS Platform as a Service
QoS Quality of Service
REST Representational State Transfer
RHEL Red Hat Enterprise Linux
ROI Return of Investment
RR Round Robin
MODAClouds

th
2013 6
SaaS Software as a Service
SLA Service Level Agreement
SLO Service Level Objective
SMI Service Measurement Index
SOA Service-oriented Architecture
SRC Source
SRPT Shortest-remaining-processing-time
USDL Unified Service Description Language
VM Virtual Machine
WLC Weighted Least-Connections scheduling
WP Work Package
WRR Weighted Round Robin
WS Web Service
WSGI Web Server Gateway Interface
WUI Web User Interface
XML eXtensible Markup Language
XMPP eXtensible Messaging and Presence Protocol

MODAClouds

th
2013 7

1. Introduction
1.1. Context and objectives

The main goal of MODAClouds is to deliver methods, a Decision Support System (DSS) and an open source
IDE and run-time environment for the high-level design, early prototyping, semi-automatic code generation, and
automatic deployment of applications on multi-clouds with guaranteed QoS.
The MODAClouds Consortium

The present document is Deliverable 6.1 Analysis of the state of the art and defining the scope (henceforth
referred to as D6.1) of the MODAClouds project. The main objectives of this document is to analyse the state of
the art regarding cloud monitoring, management APIs, automatic deployment and run-time performance models,
and to establish the general requirements that will define the scope of the MODAClouds project.
Figure 1.1.a depicts the methodology followed to define the requirements presented in D6.1. In order to collect
the requirements that will drive the development of the MODAClouds framework, the following steps have been
carried out:
i. Collection and structured review of the relevant State of the Art;
ii. Elicitation of requirements from (i); and
iii. Prioritization of the requirements.

Figure 1.1.a. The MODAClouds requirements elicitation..

We have carried out an extended review of the State of the Art in the following research fields covered by
MODAClouds in order to identify gaps, deficiencies, needs and problems, and thus elicit requirements:
Run-time platforms for cloud environments;
Monitoring cloud infrastructures and applications;
Quality of service in cloud-based environments.
The QoS metrics of interest in this deliverable are identical to the ones reviewed in D5.1 for the design-time
evaluation. Thus, they are mainly concerned with quantifying application responsiveness, scalability, and
dependability. We point to D5.1 for an extensive description.

The state of the art analysis is based on an initial search in the major research databases of computer science, i.e.
ACM Digital Library, IEEE Xplore, SpringerLink, ScienceDirect and Google Scholar using keywords such as
Cloud computing, multiclouds, risk, quality management, interoperability, architectures, deployment,
monitoring, etc. We gave priority to publications dated from 2007 (as according to Google Trends search and
news reference volume data the term Cloud computing started becoming popular in 2007 [GT13]) untill
February 2013. However, older references are used when necessary. From these papers, references were checked
and additional papers were found. Initiatives coming from standardization bodies, leading vendors and funded
projects were also included in this survey.
This resulted in a collection of several hundred publications that included (a) conference, workshop and
symposium papers, (b) journal articles, (c) electronic articles and (d) technical reports and white papers. Around
200 publications were finally selected as the most relevant.
0GHGI JK GLI
HMG HNHOPQRQ
(ORSRGHGI
6ITURIVINGQ
;MRJMRGRWI
6ITURMIVINGQ
MODAClouds

th
2013 8
1.2. MODAClouds Run-Time Environment Vision
Before presenting the analysis of the state of the art, in this subsection we describe the overall goals of the
runtime environment, we describe the general approach and the high-level conceptual architecture. With this we
aim at providing a general framework to discuss self-adaptive architecture. This also provides relevant
information that might be useful when discussing monitoring platforms and reviewing existing QoS management
solutions.
1.2.1. Goals and Technical Assumptions
MODAClouds aims at defining a runtime environment specifically tailored to the execution of applications
developed with its design-time tools. The high-level objectives of the MODAClouds runtime environment are
several and can be summarized as follows:
1. Define a monitoring platform to characterize the state of applications developed and deployed using
MODAClouds.
2. Define self-adaptive policies to manage application QoS at runtime. These policies will rely on models,
shipped with the runtime environment, to perform predictions on application performance and
scalability as well as to track or estimate its current status and resource demands.
3. Develop an execution platform for managing application deployment, configuration, and run-time
execution. This platform will be utilized by the self-adaptive policies to manage application QoS.
4. Define data synchronization and load balancing mechanisms to support the execution of an application
that is replicated over multiple clouds to ensure high-availability.
The above goals are quite broad and therefore require clarifying more precisely the scope. The main underlying
technical assumptions of the runtime environment that will be adopted are as follows:
Supported application classes. For an application developed with the MODAClouds design-time methodology,
the runtime environment will offer its functionalities on the supported IaaS and PaaS platforms. For these
applications, the run-time environment aims at maximizing automation, while still keeping the human in the
loop, especially in situations when this is highly-desirable, for example incident management. We will focus on
Linux applications, with a specific emphasis on applications running inside the Java virtual machine. Extension
to other application classes will be evaluated and defined after the requirement phase for the main case studies is
completed.

Single-Cloud Deployment. MODAClouds will support public cloud deployments where an application is
entirely deployed on a given IaaS platform or entirely deployed on a given PaaS platform. The choice of the
target platform is delegated to the software developer, which should base this choice on the MODAClouds
decision support system.
Multi-Cloud Deployment. MODAClouds applications may require the simultaneous deployment on two cloud
platforms or two different availability zones in order to guarantee continuous availability in face of a cloud
outage. In the context of the runtime environment, we therefore need to ensure that the failure of a cloud
platform A will not result into unavailability also for the backup copy deployed on cloud platform B. Based on
this observation, we will assume that the two applications will have independent runtime environments. A
special-purpose software component will be developed to ensure that upon a failure of one of the two platforms,
the incoming application traffic can be gracefully redirected to the other platform. Data will be replicated on the
two clouds and the runtime environment will offer a mechanism based on streams to synchronize the data across
clouds and therefore optimise the downtime. Notice that the migration process underlying the deployment of an
application running on cloud A onto a new cloud B is intended as process that will be fully done at design-time,
it is not intended as code movement or live migration. The only migration operation that will be performed at
runtime is the data synchronization that will be used to migrate the data.
Application Deployment and Execution. MODAClouds will manage the execution of applications on a set of
target IaaS and PaaS public clouds. To avoid making the scientific and technical approach of the runtime
environment too dependent on the target clouds chosen, we intend to use the work done in the mOSAIC and in
the Cloud4SOA FP7 projects as a starting point for defining the MODAClouds runtime execution platform. In
particular, we intend to reuse part of the execution mechanisms implemented in mOSAIC for IaaS platforms, and
in Cloud4SOA for PaaS platforms, to manage the MODAClouds application lifecycle (e.g., application
MODAClouds

th
2013 9

start/stop/initial configuration/etc). This will provide a runtime environment that exploits and enhances technical
outputs of recent FP7 projects, thus leveraging code maturity and avoiding solving issues already addressed by
other EU projects.

Application-Oriented. The MODAClouds runtime will allow to monitor and adapt the deployment of a cloud
application with the goal of meeting the QoS goals decided at design-time. This will require operating in parallel
to the application a number of software services, controllers, and tools that will be instrumental to achieve this
goal. However, as part of the runtime environment, these systems will consume themselves physical resources
and, in line of principle, may be subject to their own set of QoS and cost constraints. In general, to avoid an
excessive complexity in the design of the runtime and to avoid focusing on secondary aspects, these concerns
will be treated as minor compared to the QoS and cost constraints specified by the developer for the running
application.

Legacy software. The runtime environment will also provide the ability to execute applications shipped with
legacy software components. One example could be a legacy business logic written in C++. However,
limitations may apply on the ability of the framework to automate all the operations for these applications.
Therefore, a substantially higher level of involvement of human administrators for management could be
required in this case.

Monitoring approach. The monitoring approach adopted in MODAClouds is focused on addressing the needs
of all stakeholders involved in the operation and management of cloud applications. In particular, it aims at
satisfying the following important requirements: i) be able to collect data at various levels, starting from the
hypervisor level, when exposed by the cloud platform, up to the application level; ii) be able to cope with the
case of single-cloud and multi-cloud deployments; iii) be extensible in terms of the information to be collected
and of the filtering and correlation actions to be performed on such data. In order to address the last point, the
architecture of the monitoring approach will be highly composable and will allow for the installation of special
purpose data collectors, in charge of acquiring data depending on the needs of the stakeholders interested in
performing monitoring, and data analysers. Moreover, it will feature a proper language to support different kinds
of filtering and composition of the acquired information.
1.2.2. MAPE-K Loop

The overall paradigm adopted by the MODAClouds runtime environment is inspired by the research area of
Autonomic Computing, which has greatly increased over the course of the last ten years the common
understanding on how to realize systems with self-managing capabilities. The MODAClouds runtime is inspired
in its high-level design by the MAPE-K loop, which is one key conceptual aspect of the Autonomic Computing
field. The MAPE-K autonomic loop (Monitor, Analyze, Plan, Execute, Knowledge) represents a blueprint for the
design of autonomic systems where a managed element is coordinated by a loop structured in 4 phases and a
common knowledge (see Figure 1.2.a).
MODAClouds

th
2013 10

Figure 1.2.a: MAPE-K loop

The MAPE-K loop is structured in 4 consecutive phases:
Monitoring. The monitoring component is responsible for managing the different sensors that provide
information regarding the performance of the system. In the MODAClouds context for example,
sensors can capture the current consumption of critical node resources (such CPU and memory) but also
other performance metrics (such as the number of processed requests in a time window and the request
process latency). The monitoring granularity is specified by rules. Sensors can also raise notifications
when changes to the system configuration happen.
Analysis. The analysis component is responsible for processing the information captured by the
monitoring component and to generate high level events. For instance, it may combine the values of
CPU and memory utilization to signal an overload condition in the platform.
Planning. The planning component is responsible for selecting the actions that need to be applied in
order to correct some deviation from the desired operational envelope. The planning component relies
on a high level policy that describes an adaptation plan for the system. These policies may be described,
for example, using Event Condition Action (ECA) rules that are defined by a high level language. A
ECA rule describes for a specific event and a given condition what action should be executed. In the
context of MODAClouds, the actions may affect the deployed artifacts and the bindings among these
ones.
Execution. The execution component applies the actions selected by the planning component to the
target components.

Additionally, the shared knowledge includes information to support the remaining components. In the context
of MODAClouds, it maintains information about managed elements.

Overall, a model of the MAPE management processes within the context of a generalized system management
meta-model also developed within few relevant projects like Auto-I [AUT13], ANA [ANA13], or CASCADAS
[CAS13]. In addition, the topic is analysed by Calinescu, with the meanings of formal methods [GPA13].
1.2.3. Conceptual Architecture
The MAPE-K loop only represents a design blueprint that leaves lower level details of the architecture purposely
unspecified (i.e., they do not impose constraints on the implementation). In these initial months of the project,
the MODAClouds consortium has defined a reference conceptual architecture for the runtime platform which we
here describe and that follows the MAPE-K loop design approach. The details and implementation of this
conceptual architecture will be specified more in the details in follow up Year 1 deliverables which will iterate
on this initial description to produce a concrete implementation plan. The goal of this section is to provide a
MODAClouds

th
2013 11

high-level intuition of the systems that will compose the architecture, which is required in order to identify the
actors that are involved in the requirement specification.

Figure 1.2.b: Conceptual architecture of MODAClouds runtime environment

Figure 1.2.b shows the main components of the MODAClouds runtime platform. We here consider a runtime-
centric perspective and therefore, the runtime subsystems are exploded, while the design time systems are left as
a single component (the MODAClouds IDE).

More in details, the runtime environment will be composed of three main platforms:
1. The monitoring platform responsible for data collection, analysis, and definition of summary
measures (application health, QoS, etc). This platform will be in charge of collecting information from
various sources (cloud monitoring interface, application-level components, PaaS containers) and
filtering, correlating, and analysing such information with the purpose of executing a set of monitoring
rules that will be defined by the design-time tools provided by MODAClouds. For example, a
monitoring rule could check every 60 seconds in a time window of 3600 seconds if response time is
greater than 1 sec more than a given number of times. The violation of such rules determines the
triggering of actions in the self-adaptation platform.
2. The self-adaptation platform. This subsystem is responsible for state tracking of the currently
deployed application and for the decision-making to identify changes in the current configuration of the
cloud-based application to satisfy with SLA requirements. The self-adaptation platform is activated by
the monitoring platform in response to some alarming event or performs predefined analyses
periodically without being explicitly triggered by the monitoring platform. The self-adaptation platform
will be developed jointly across WP4 and WP6, the former focusing more on the underlying software
engineering aspects (e.g., the Models@Runtime approach described in deliverable D4.1), while the
latter will focuses more on the runtime reasoning. We point to deliverable D4.1 for a description the
self-adaptive platform and we focus here only on specifying its requirements that do not overlap with
those in D4.1.
3. The execution platform. The execution platform provisions all services that are needed to deploy on
the clouds both the application and the corresponding monitoring platform. The initial deployment
decisions are initially provided by the MODAClouds IDE and translated to the target platform. The
execution platform also offers the services and the API needed to maintain and change the operational
state of the deployment at runtime, typically under triggers of the self-adaptation platform.
MODAClouds

th
2013 12
Each platform has therefore a clearly identified role and will interact with the other platforms, whenever
possible, via RESTful services or data streams that will hide implementation details. This will allow the runtime
platform to integrate more easily contributions written in different languages and styles.
1.3. Overview
The rest of the document is divided into four sections:
Section 2 presents a description of the state-of-the-art on run-time platforms for cloud environments,
specifically those handling the execution and deployment of applications. With this section we aim at
analysing tools that could potentially be used to create the run-time MODAClouds environment.
Section 3 describes previous work on monitoring cloud infrastructures and applications. This is an
important aspect in order to be able to analyse the system and evaluate its performance in different
regards.
Section 4 reviews methods proposed during the last decade to provide tools to system administrators to
manage QoS of online applications. QoS is essential for the optimal delivery of cloud services.
Finally, Section 5 presents the use cases and the requirements of the platform.

MODAClouds

th
2013 13

2. State-of-the-Art: Run-Time Platforms
2.1. Preamble
The purpose of the current section is to highlight the potential solutions that could be used to develop
MODAClouds run-time environment, specifically those handling the execution and deployment of applications.
Thus the technical solutions surveyed herein, are grouped in the following broad categories:
PaaS --- solutions featuring an integrated application and required services or resources life-cycle ---
from packaging, through deployment, execution and monitoring --- available usually as hosted services
from various cloud providers;
deployment and execution --- independent software artifacts that focus mostly on the deployment and
execution phases; these could be seen as lightweight PaaS replacements, especially suitable for
applications requiring a more conservative environment, such as customized software packages, full
access to the OS or the network, etc.;
deployment only --- similar with the above but focusing only on the deployment aspects, such as
installation and configuration;
This survey is structured as follows:
we start with a section (Criteria) describing the characteristics and capabilities we are interested to find
in the proposed solutions;
then in the next one (Solutions) we systematically take the exponents from the categories mentioned
earlier, and for each one we provide the following information:
o the "Overview" section presenting a very high level picture, together with a small dictionary
describing the terms either unique or having a particular interpretation in the solution's context;
also if needed an architectural view, or short descriptions of various important mechanisms;
o the "Criteria" section, summarizing characteristics pertaining to the surveyed solution; this
should be materialized in a highly structured text, such as lists or tables;
o followed by the "Notes" and "Limitations" sections that should provide a subjective --- from
the MODAClouds perspective --- critique of the proposed solution, highlighting advantages,
disadvantages, future developments, and if possible comparisons with other solutions in the
same category;
o then a short "MODAClouds integration" section that would highlight how would the project
benefit by integrating the proposed solution;
o and last a "References" section pertaining solely to the analyzed solution --- those references
pertaining to multiple solutions should be placed in the proper global references section;
However before proceeding we must make a few observations about the purpose of these solutions, and the
distinction between the user's application, and the MODAClouds services supporting that application. Therefore:
in the context of the following sections by application we understand the user's application; that is the
software artifacts written and provided by the user himself which implement the desired functionality,
plus the required services or resources, such as databases, middleware or other generic components, that
are required by the user's components; such an application could target either a PaaS or IaaS cloud
provider;
however in order to enhance the user's application with features such as monitoring, QoS enforcement,
automatic scalability, or cloud migration, we are required to run certain support services alongside the
user's application; unfortunately due to the complexity and requirements of such support services, we
are uncertain that they would be able to run inside a PaaS, as such it is our intention to run these on a
IaaS alongside the PaaS that hosts the user's application; (for a IaaS-only solution things are much
simpler;)
we focus on the deployment and execution solutions for the user's application; although we keep an eye
open for the possibility to manage alongside the application also the support services; but as stated this
is not a main requirement;
moreover the current survey assesses only the technical aspects related with deployment and execution,
meanwhile other parts of this deliverable tackle individual traits such as QoS, monitoring or load-
balancing; non-technical aspects such as cost and pricing models are discussed in deliverable D2.1;
MODAClouds

th
2013 14
we reiterate once more that, except the "Overview" section, the other sections of the survey are a
subjective take on the solutions, the perspective being that of the MODAClouds own focus and
requirements;
2.1.1. Criteria
Although many of the surveyed, or other existing, solutions are production-ready --- or even better backed up by
powerful companies in the IT sector --- and offer many features, we must focus our effort in determining if they
are a good match with MODAClouds requirements, described in a later section. Such a goal implies two separate
conditions:
first of all they should be suitable for our industrial partners' case study applications; which in turn
implies matching the supported programming languages, palette of available resources and middleware,
and nonetheless security requirements;
and in order to fulfill our project's goal, they must provide a certain flexibility, to allow our run-time
environment to integrate, and provide support for the user's application;
Therefore, we are especially interested in the following aspects --- due to space constraints, the possible values
are described in Appendix A:

type
Actually one of the categories mentioned in the beginning of the section, which broadly describes what is the
purpose of the solution and the range of features it offers.

suitability
Shortly, how mature, or production ready, is the solution? Does it have a supportive community built around it?

application domain
What would be the "main flavour" of targeted applications?

application architecture
Broadly matching a targeted application architecture.

application restrictions
What constraints would the application (and part of our run-time environment) be subjected to?

programming languages and programming frameworks
Some solutions target (or at least are focused on) a particular framework (such as Servlets for Google App
Engine's Java environment, Capistrano tightly focused on Ruby on Rails deployment, etc.). Thus it would prove
useful to know in advanced which are the officially sanctioned or preferred frameworks.

scalability
How can scalability be achieved?

session affinity
Usually PaaSs offer HTTP request routers (or dispatchers); how does they load-balance clients between the
multiple available service instances? (How each client is identified depends on the internals of the PaaS and it
could range from source IP address to cookies.)

interaction
How can we pragmatically interact with the proposed solution?

hosting type
How would we be able to use the proposed solution?

portability
If a developer uses a particular solution, how easy is for him to move to another solution having the same role?

services
MODAClouds

th
2013 15

Especially in the case of PaaS, what additional resources or services (such as databases, middleware, etc.) are
available and managed directly by the solution, and thus integrated with the application life-cycle?

monitoring coverage
Especially in the case of PaaS, how much do the monitoring facilities cover and expose to the operator?

monitoring level
From which perspective, or at which level of the software and infrastructure stack, are the metrics provided?

monitoring interface
What technique --- such as standard, API, library, etc. --- is used to expose the monitoring information to the
operator?

Resource providers
Most of the PaaS do not also have their own hardware resources, but instead are built on top of other publicly
accessible IaaS providers. Thus if the user needs services not offered by the PaaS itself, it could use that IaaS to
host the missing functionality himself.

multi-tenancy
This characteristic pertains mainly to PaaS or PaaS-like solutions, and tries to assess if multiple applications can
share the same instance of the PaaS.

resource sharing
This characteristic pertains mainly to PaaS or PaaS-like solutions, and tries to assess how are the application
components or services mapped on the provisioned VMs.

limitations
Most of the solutions impose quantitative limitations (such as memory, bandwidth, storage, etc.) on the running
applications, which could be of interest especially in determining the suitability for our case studies.

2.2. Deployment Approaches
2.2.1. PaaS solutions

In the following subsections we survey the most important PaaS solutions, either hosted, like Heroku or
Windows Azure, either deployable on self-provisioned IaaS, such as Cloud Foundry. Although there are
countless other technologies fitting inside the PaaS category, especially emerging products from various startups,
we have limited our survey to the ones most likely to be used inside MODAClouds, because they provide a wide
degree of flexibility, or are popular choices amongst developers. Moreover this list is not exclusive, because if
during the implementation phase of the project we find other suitable candidates we can use them as well.
2.2.1.1. Heroku
Overview
Heroku is a classical PaaS solution, featuring a large degree of flexibility for the targeted application, ranging
from the largest set of supported programming languages, to the availability of third party integrated services.
What it sets it apart from other PaaS solutions is the simplicity of developing and deploying applications that
target this platform, the only requirement being to respect the "general accepted best practices", as summarized
in [HER1] and detailed in [HER11]. For example while Google App Engine requires the developer to choose
exactly one of the three supported languages, then to strictly adhere to a reduced API and use Google's
customized data stores, in contrast Heroku allows the developer to run almost any "well behaved" web
MODAClouds

th
2013 16
application, and exposes access to resources ranging from the classical SQL databases to distributed search
indexes.
Regarding the tooling, the deployment is almost completely driven via the Git distributed versioning system, but
at the same time there are CLI tools --- based on web services --- that allow full control of the application.
Characteristics
type PaaS
suitability production
application domain web applications [HER1]
application architecture n-tier applications [HER1]
application restrictions container [HER2]
programming languages Ruby, Python, NodeJS, Java [HER3]
programming frameworks any
scalability manual [HER4]
session affinity non-deterministic [HER7]
interaction WUI, CLI, WS, API [HER9]
hosting type hosted
portability out-of-the-box
services large palette of managed services [HER6]
monitoring coverage none (however there are add-ons)
backing provider Amazon EC2 [HER2]
Limitations
OS resources:
memory per dyno 512 MB (soft-limit) and 1.5 GB (hard-limit) [HER2]
disk per dyno unspecified
CPUs per dyno unspecified
package size 200 MB [HER8] [HER10]
Networking accessibility:
inbound HTTP exclusively [HER7] [HER10]
outbound allowed (with exceptions)
internal disallowed [HER10]
Notes
Although Heroku has official support for some programming languages, it can support practically any
application that can be run on Linux, via their "buildpack" feature [HER5].
As above, out-of-the-box Heroku does not provide any type of monitoring --- except for the existence of
processes --- but there are various third party add-ons available that monitor the running application from within.
Unfortunately even with these add-ons we cannot get any data from "within" the platform.
With regard to the network accessibility, for inbound network connections only HTTP --- for the domain under
which the application was registered --- is forwarded to the processes with the web type, while any other access
from the exterior does not seem to be supported; on the other hand, it is hinted that the HTTP router does support
the CONNECT verb --- a unique feature among existing PaaS solutions --- enabling thus the proxying of
arbitrary protocols. For outbound connections it seems that there are not any constraints except the "best
practice" policies, and the fact that the source IP address might change at any time. The connections between
various processes, thus internal, seem to be disallowed [HER10]. (The documentation is not definitive in any of
these regards, especially about the existence of any quota.)
The cost seems to be higher when compared with VMs (with higher capacity) from the underlying provider
(Amazon), and especially when using add-ons. On the positive side, for each application there is an amount of
free time which would allow a user to run a single process for a month (or multiple processes for a fraction of a
month).
MODAClouds

th
2013 17

MODAClouds integration
Heroku provides unique possibilities, not found in other commercial PaaS solutions, for integration with
MODAClouds. We could provide customized "buildpacks" [HER5] that would augment the user's application
code with our probes, without impacting his development, packaging and deployment experience.
Moreover, because Heroku allows the user to run multiple types of processes, and because we can customize the
deployment, we could run the support services that MODAClouds needs directly inside the same application
instance. On the other hand, because the platform itself is hosted over Amazon EC2, we could easily deploy our
services there.
Finally, the API exposed by Heroku [HER9], although simple, it allows fine grained control over the application,
from changing the number of instances of a certain process type, to attaching add-ons or accessing logs.

2.2.1.2. Cloud Foundry
Overview
At a high level Cloud Foundry can be also seen as part of the classical PaaS family, similarly to Heroku. Cloud
Foundry allows the developer to run almost any "conventional" application without changes --- one that uses the
most common frameworks and respects the common "best practices" --- also providing support for the most
common resources (such as relational databases).
Two important highlights of Cloud Foundry are the fact that its source code is released under an open-source
license, and that there is a "Micro Cloud Foundry" [CFY4] solution that enables the developer to have a local
deployment and testing environment that simulates the hosted platform.
Because Cloud Foundry is at the same time both a hosted PaaS (by VMWare), and an open-source product, in
this survey we focus especially on the hosted platform, because many of these limitations and constraints depend
solely on the choices made by the hosting provider; meanwhile the open-source variant allows anyone that wants
to deploy it, to add support for new programming languages, services or raise limitations.
Unfortunately unlike Heroku it is not ready for production yet, allowing only limited resources to the
applications, providing a very small set of additional resources or services, and constraining the supported
programming languages and frameworks.
However, compared with Heroku and other PaaS solutions, regarding the possible application architectures and
the HTTP routing layer, it promises to offer more flexibility.
To sum up. Cloud Foundry offers three different forms of a PaaS:
CloudFoundry.com: Public instance of the open Cloud Foundry PaaS operated by VMWare.
CloudFoundry.com is now in beta and can be used for free.
Micro Cloud Foundry: Complete Cloud Foundry instance contained within a virtual machine. This
product is intended to be installed directly in a developers computer in order to simulate the interaction
with a real cloud foundry-based private or public cloud with the same environment, ensuring that
applications that run locally will run in the cloud too.
Cloudfoundry.org: Open Source project hosting the Cloud Foundry technology. With the tools within
this project a private PaaS can be built and deployed on top of any IaaS. Different configurations can
be built, achieving different supported languages, frameworks and application services.

Characteristics
type PaaS
suitability emerging
application domain web applications
application architecture 2-tier applications (but see the notes)
application restrictions container
programming languages Java, Ruby, NodeJS [CFY3]
programming frameworks popular frameworks (Spring, Java Servlets, Rails) plus "standalone"
MODAClouds

th
2013 18
scalability manual
session affinity sticky-sessions [CFY1]
interaction CLI, WS, API
hosting type hosted, simulated, deployable open-source
services MySQL, PostgreSQL, MongoDB, Redis, RabbitMQ [CFY2]
monitoring coverage basic
monitoring level container
backing provider VMWare (private solution)
Limitations
OS resources:
memory 2 GB [CFY1]
disk 2 GB [CFY1]
descriptors 256 [CFY1]
CPUs 4 [CFY1]
Networking accessibility:
inbound HTTP exclusively
outbound allowed (with exceptions) [CFY1]
internal unspecified
Notes
Much that can be said about Cloud Foundry was already written in the overview above, and could be
summarized as:
it could be considered as an alternative to Heroku;
its major advantage over the other PaaS solutions is its open-source license;
there is a solution that offers the developer an "emulator" of the hosted platform that can be run on a
local machine;
overall it is a promising solution, but currently it is still in beta status;
Regarding the support of programming languages and supported frameworks, it is more strict than Heroku: only
the available ones can be used, and there is not an option to customize the build and packaging process (without
the developers intervention).
The limitations are clearly described, and on par with Heroku. Unfortunately, due to its current beta status, it
cannot host any real world application, because all the applications (or application instances) summed total
memory cannot be over 2 GB [CFY1], and there does not seem to be support for domain names others than
*.cloudfoundry.com. Moreover, the limitation on file descriptors of 256 could be worrying, because it implies
that each instance cannot have more than 256 concurrent HTTP requests, which for real-time web-applications
(such as using web-sockets) would be a show-stopper.
Although the deployed applications must fit the 2-tier model (i.e. monolithic process and database or middleware
layer), Cloud Foundry has a unique feature that allows different applications to share the same services or
resources, thus allowing the user to obtain a n-tier model by splitting his application in multiple ones when
deploying to Cloud Foundry.
Again we stress the fact that the following statements apply mainly to the hosted solution (by VMWare), because
in a self-deployed Cloud Foundry environment the operator has many other choices.
From MODAClouds perspective, Cloud Foundry currently has less to offer than Heroku, as it has the following
major disadvantages, especially when thinking about how to run the additional support services that
MODAClouds require:
MODAClouds

th
2013 19

it is backed by VMWare IaaS solution (presumably VMWare vSphere), but in a private cloud; this
implies that we will not be able to provision any VM for our support services, that would be able to
interact with the running application;
currently the supported services and resources is a very small set, and there is not any (practical and
cost effective) way to add others;
the limitation imposed on the applications --- the total amount of memory used by the entire application
is only 2 GB --- makes it difficult for any real world application to be run;
the limited number of programming languages imposes large restrictions on what services we are able
to effectively use;
On the positive side it does offer some advantages, although marginal compared with the drawbacks:
it does provide basic monitoring information (memory, CPU, disk) at instance level;
it allows the user to upgrade the application without interrupting the service --- through manipulation of
the HTTP routes, and only if the old and new versions of the application are capable of handling it;
But all in all we could still manage to use it as a run-time platform for the applications, provided that we host the
support services in another provider, not without performance degradation or cost increases.

2.2.1.3. AppFog
Overview
AppFog is one of the commercial PaaS solutions that have been built upon the open-source licensed Cloud
Foundry code base.
AppFog is limited in a similarly to Cloud Foundry, hosted by VMWare. As such in the current section we shall
focus mainly on what is different than in the case of Cloud Foundry.
Characteristics
type PaaS
application architecture 2-tier applications
programming languages Java, Ruby, Python, NodeJS, PHP [APF2] [APF4]
programming frameworks popular frameworks plus "standalone"
scalability manual
session affinity sticky-sessions (presumably)
interaction WUI, CLI, WS, API
hosting type hosted
services MySQL, PostgreSQL, MongoDB, Redis, RabbitMQ [APF3] [APF4]
backing provider Amazon, HP [APF4]
Notes and Limitations
Most are presumably the same as in the case of Cloud Foundry --- there is not a clear documentation stating the
constraints --- with the following notable exceptions:
the limit of 2 GB total memory for an application can be raised if the user pays;
however the user can create an unlimited number of applications for free, within that 2 GB limit;
MODAClouds

th
2013 20
there is a way to access some resources (like the relational databases) backing the applications [APF5];
it is backed by public IaaS offerings;

As stated in the introduction AppFog inherits some of the drawbacks of Cloud Foundry, especially when
compared with Heroku. For example the number of programming languages is slightly larger than with Cloud
Foundry, but we cannot run arbitrary applications as we can in Heroku.
However, because AppFog is backed by Amazon and other public cloud providers, we have the ability to deploy
our support services on VMs residing in the same cloud. Moreover through the "tunneling" feature [APF5] we
could support the migration of the application data, at least for the limited set of supported resources.
2.2.1.4. AWS Elastic Beanstalk
Overview
In order to ease the deployment of applications over its IaaS solution, Amazon provides a simple wrapper
service, namely AWS Elastic Beanstalk, which given a deployable software artifact --- either in compiled form,
or source code, depending on the target platform --- automates the following aspects of the application life-cycle:
[ABS1]
provisioning the required VMs from EC2, with the right image needed for the targeted run-time;
configuring both the VM related aspects, like security groups, but also the complementary required
services, like AWS Elastic Load Balancer, or AWS CloudWatch;
deploying the software artifact inside the run-time environment;
managing complementary services like the Elastic Load Balancer solution;
Concepts:
application An umbrella concept for all the entities that belong to the same logical "application".
version A deployable software artifact, suitable for deployment. Each application can have at any time
multiple versions, each prepared for immediate execution, thus enabling the operator to rollback to
previous versions if a particular deployment manifests issues. [ABS4]
environment The run-time instance of a particular version, again there can be multiple concurrent
environments, possibly of the same version. The environment also specifies the characteristics of the
VM to be deployed on. [ABS4]
resources Any external database, middleware, etc., that the application needs, and which is completely
out of the control of the platform (although the various web-based wizards, do allow the operator to
create an Elastic Load Balancer or an RDS instance).
Characteristics
type application deployment and execution
suitability emerging
application restrictions none
programming languages Java, PHP, Python, Ruby, .Net [ABS1]
programming frameworks a selected set, specific to each supported language [ABS1]
scalability none (delegated to AWS Auto Scaling)
session affinity none (delegated to AWS Elastic Load Balancer)
hosting type hosted
MODAClouds

th
2013 21

services none (manual provisioning of any AWS service)
monitoring none (delegated to AWS Cloud Watch)
providers Amazon
multi-tenancy single application
resource sharing 1:1

Right from the start, the AWS Elastic Beanstalk documentation, [ABS1], clearly states that this is a solution
meant for getting developers or operators to quickly adopt cloud-based deployments. Furthermore it states that
once the user has a deeper understanding of the principles governing AWS, it should migrate towards using
AWS CloudFormation.
However for the simple case of 1-tier web-applications, or 2-tier ones, where the second tier is the database, it
proves a perfect match by providing a simple API to manage the application life cycle.
From a technical perspective it can be seen as a parallel to Windows Azure cloud services solution, in that
although it exposes a PaaS-like functionality, each component is deployed on an individual VM, granting the
code full access to the underlying OS --- most of which are unique to Elastic Beanstalk --- such as:
each application version, can have dependencies on native packages; [ABS5]
moreover the user can choose a customized VM image for each environment; [ABS5]
the operator is granted full SSH access on the underlaying VM;
there is the option to attach AWS EBS volumes or take snapshots;
On the downside it does not automatically handle any other services or resources, like AWS S3, DynamoDB,
etc., but are left to the user to provision and properly configure; this being one of the reasons why AWS
CloudFormation is a better choice in this respect.
As stated AWS Elastic Beanstalk could be used in order to deploy simple web applications on-top of AWS EC2,
without handling ourselves the VM provisioning and container deployment.
Although, in terms of functionality, it does not offer more than a hosted PaaS, like Heroku, or a deployable one,
like Cloud Foundry, it does prove useful in the case where the application requires more resources than the PaaS
offers; for example in case of demanding Java-based web applications, which require VMs from the top of the
EC2 offering. Moreover it provides an API as simple as the other PaaS solutions for managing the application
instance.
2.2.1.5. Google App Engine
Overview
Google App Engine (GAE) is one of the first commercial PaaS solutions, and from a technical point of view it is
the closest to the PaaS philosophy, that is it completely alleviates programmer's concerns related to the
infrastructure, either software or hardware, upon which the application runs. This comes at the price of flexibility
because the developer has to adhere to a very strict set of rules.
A GAE application is mainly composed of:
request handlers [GAE1] These are the "normal" instances available in GAE, that must conform to the
Java Servlet API --- or the equivalents in other languages --- towards which requests are routed.
However, these handlers must obey some strict limitations, like response time, available hardware
resources, etc.
backend handler [GAE3] From a programmer's point of view they are identical with the "normal"
handlers, except that some limitations have lower bounds, like the possibility of running background
threads, more memory, etc.; moreover the billing model is similar to that of a classical IaaS provider.
service Various services already provided by Google, which are exposed to the applications via
dedicated API's in the targeted language.
version Each application can be deployed and re-deployed multiple times, each mapping to a
particular version, which is individually accessible.
MODAClouds

th
2013 22
Characteristics
type PaaS
application restrictions limited
programming languages Java, Python, Go
programming frameworks Java Servlets, Python WSGI
scalability automatic scalability
session affinity non-deterministic [GAE1]
interaction CLI, WUI, WS
hosting type hosted
portability locked
services object store, memcache, mail, HTTP fetching, user management, XMPP
monitoring coverage extensive
monitoring level application
resource providers Google
multi-tenancy multiple organizations
resource-sharing n:1
Limitations
Access limitations:
sockets are completely disallowed, both for listening or connecting; the only way to communicate with
the outside is through the API's that Google provides; (there is however an API for HTTP resource
fetching or email sending;) [GAE1]
the interaction with the clients must be confined to HTTP only; [GAE1]
the application cannot mutate the local file-system operations; [GAE1]
(for request or task handlers) threads cannot out-live the request life-span; moreover the API is custom
for GAE; [GAE1]
most system- or native-related calls, like JNI, or interpreter-related calls are forbidden; [GAE1]

Quantitative limitations:
the maximum request or response size for an HTTP response is 32 MB; [GAE1] [GAE2]
each HTTP request must be resolved in at most 60 seconds; [GAE1]
there are a maximum of 50 threads for each request; [GAE1]
the maximum size of a file in the package is 32 MB; [GAE1] [GAE2]
the maximum memory available for a normal instance is 128 MB; [GAE3]
the maximum memory available for a backend instance is 1 GB; [GAE3]
other quotas are high enough, especially for the payed applications, that any even medium sized [1]
application should not worry; [GAE2]

Although the previous paragraphs listed only the most important limitations, GAE has a very complex quota and
QoS model: [GAE2]
on the temporal axis we have either daily quotas and per-minute quotas referring to resource usage;
on the cost axis there are the billable quotas, on a daily basis, that ensure the application operational
budget is limited;
and the safety quotas, that ensure no application depletes available resources;
MODAClouds

th
2013 23

Notes
Although GAE has serious limitations in terms of development flexibility, it compensates by a high integration
with other Google's products, especially Google Accounts, GMail, Google Drive, etc., by allowing the developer
to leverage all those additional services and easily integrate them in his applications. As such GAE would be a
prime candidate for a PaaS hosting an application based on Google's services.
It seems that GAE is tuned towards small response time applications, because the automatic scaling feature is
available only for those applications where "most" requests are under a second [GAE1]. Moreover the backend
handlers are not automatically scaled; however the "dynamic" backends are automatically "woken" when
needed, and then "disabled" when idle [GAE3].
An interesting feature of GAE is the support for multiple application versions, easily accessible by altering the
URL's host name [GAE1], thus an older un-updated client is able to use an older variant of the service. Another
interesting feature is the availability of the SPDY protocol --- a replacement of HTTP --- which makes GAE the
single PaaS currently supporting it.
Related to data access, GAE provides its own set of data stores tailored for scalable applications; in case of the
Java environment the developer has the choice of either JDO or JPA compatible interfaces, together a limited
SQL-like query language, either a set of low-level interfaces to interact directly with the data store with its native
data model [GAE4]. Related with the additional services, again GAE provides a wide variety of services
integrated by Google [GAE5].
However, if the developer needs a data store, middleware or service not part of Google's offering, the only
solution is to host it outside GAE (for example inside Google Compute Engine) and access it via HTTP-based
requests; which unfortunately rules out most database systems and middlewares.
Because GAE is a very peculiar PaaS --- whose feature set is not matched by other PaaS solutions, although
there are prototypes or projects cloning it --- it stands out as a unique development and deployment target, and
coupled with its access restrictions, it will require on our part more work to integrate than other hosted PaaS
solutions.
On the other side the monitoring capabilities of GAE are very fine-grained, surpassing that of the other hosted
PaaS solutions, making it a good part for the monitoring platform (even if CPU utilization cannot be gathered
easily but only estimated through code instrumentation). However it lacks the ability to directly control the
scalability of the normal instances, making it unsuitable for the self-adaptation platform.

2.2.1.6. Microsoft Azure
Overview
Windows Azure is Microsoft's cloud computing solution, an umbrella for various solutions ranging from IaaS,
the VM roles, PaaS, the cloud services, and SaaS. However, in the current section we focus only on the PaaS
aspects, that is the cloud services offer.
Because the Windows Azure model for cloud services resembles closely that of Google App Engine, we shall
often compare features or assimilate concepts between the two solutions. Thus in Windows Azure we have
[WAZ2]:
web role Which handle external HTTP requests, usually handling the presentation and user
interaction, probably by delegating lengthy work items to the worker roles.
worker role The counterpart of Google App Engine's backend handler, mapping to the application
logic layer.
Characteristics
type PaaS
application architecture n-tier applications
MODAClouds

th
2013 24
programming languages .NET, NodeJS, PHP, Java, Python
programming frameworks any (compatible with the desired role)
scalability manual
session affinity sticky sessions, non-deterministic
interaction WUI, CLI, WS
hosting type hosted
portability locked
services SQL, key-value, blobs, caching, CDN, user management [WAZ1]
resource providers Windows Azure
An interesting feature of Windows Azure is how the various roles are provided, because unlike other hosted
PaaS solutions, each role has its own VM, thus increasing isolation and offering more resources per instance.
Like in the case of Google App Engine the various application components should exchange information via
queues; however there is also the possibility to host other types of data stores or middleware on VM roles, the
IaaS facet of Windows Azure, thus allowing greater flexibility.
Proving less strict that Google App Engine, and allowing similar types of applications as other considered PaaS
solutions, Windows Azure could serve as a target for deploying classical web applications. Moreover currently it
is one of the few available PaaS solutions for running .NET applications.

2.2.2. Deployment and execution solutions

In the previous section we have described the PaaS solutions most likely to be used in the project. However there
are applications where even a deployable PaaS imposes unsuitable constraints, thus to address these issues we
need to rely on even more flexible solutions, that give a finer grained control over the deployed VMs, while still
holding most of the PaaS features like deployment and scalability automation.

2.2.2.1. Juju
Overview
Juju is a Canonical sponsored open-source project, that describes itself as a solution for service orchestration
[JUJ3], by taking care of tasks from deployment, through configuration and up-to custom tasks such as backups.
Moreover it focuses mainly on cloud-based environments, although it currently supports only EC2-compatible
API's [JUJ1] [JUJ4].
Juju concepts are quite straight-forward:
service --- It is a logical construct, and can be seen as an application layer or tier, such as the MySQL
cluster, or a set of application servers handling the logic, etc. [JUJ5]
unit --- It represents a software component, a single instance of potential multiple ones, that fulfill the
tasks of a particular service. For example it is one MySQL database instance part of potentially a cluster
thereof, or a single web-server that receives a portion of the load. [JUJ5]
relation --- Relations can be established between two services --- actually between all the units of one
service, and all the units of the other one --- and is used to denote functional dependency. The matching
of relation partners is done based on interfaces which one service requires, and the other provides. For
example an application server might need a database, thus a database relation is formed between the
MODAClouds

th
2013 25

two. Moreover the same service might be in relation with multiple other service types, like in our case
the application might also need a cache and a messaging middleware. [JUJ5]
charm --- All service units are deployed, configured and establish relations, as prescribed in a set of
files, that comprise a charm. [JUJ6]
hook --- The actual active components of a charm are the hooks, nothing more than executables,
potentially written in any programming language, which are called by Juju at different stages of the
service's lifecycle. [JUJ6]
environment --- Everything that Juju manages is in the context of an environment, which equates to a
cloud provider and credentials [JUJ4]. Thus it allows the same cloud provider to be used in different
environments by having a separate set of credentials (access and secret keys). Technically this requires
an EC2-compatible and S3-compatible deployment, or it can be a testing environment based on LXC.
Compared with Puppet (Section 2.2.3.2), Juju is more focused on the application life-cycle, offering more
automation out-of-the-box, such as the relations between different services, and being more flexible by allowing
its hooks to be written in virtually any programming language, as compared with Ruby-only for the former
solutions. [JUJ6]
Juju architecture and philosophy is quite simple, it starts one VM for management purposes which then
provisions and delegates actions to the other VMs where the service units run on; then to ensure the persistence
of the state it uses an S3-like store to commit snapshots of the running service units, their configurations and
relations, and the running VMs.
Characteristics
suitability emerging [JUJ1]
application domain generic applications
scalability manual scalability
interaction CLI, WS
hosting type deployable (open-sourced), simulated
services web-servers, databases, middlewares [JUJ2]
monitoring none
resource sharing 1:1 (n:1 on the roadmap) [JUJ1]
providers EC2-compatible [JUJ1] [JUJ4]
Limitations
currently it supports only EC2-compatible API's [JUJ1] [JUJ4], although there is an option to run a
simulated version based on LXC on a single machine;
it also requires a S3-compatible permanent storage facility [JUJ4], that is used to store the state of the
cluster;
it supports only deploying one service on each VM [JUJ1] (thus a 1:1 mapping), although there are
workarounds and plans to alleviate this limitation;
the management VM is currently a single-point-of-failure, although the cluster state is committed at
regular intervals in a S3-compatible store;
it is tightly tied to the Ubuntu Linux distribution [JUJ9]; however Ubuntu is currently the most popular
distribution, and the default choice for most developer and infrastructure providers;
Notes
MODAClouds

th
2013 26
To a certain extent it can be seen as a very lightweight PaaS that is very much similar with solutions like
Amazon Elastic Beanstalk, Cloudify, and others. On the other hand it is not focusing on developer aspects of the
application, but on operational ones.
A nice feature of Juju, is that at the time of instantiation of a particular service the operator is able to customize
its behaviour by providing a set of parameters that alter the behaviour of the charm's hooks --- obviously these
parameters must be implemented by the hooks themselves. Building on that, and again if the hooks support, the
operator is able to change some of these parameters and then all the service's units react by reconfiguring
themselves accordingly. [JUJ7]
Juju goes even further and allows the related services to influence one another, like exchanging endpoints or
access credentials, by allowing configurations scoped at the relation level, which when changed, trigger hooks
for each of the other partner service units. [JUJ6]
Related to the backing infrastructure, as stated, it requires an EC2- and S3-compatible environment. And
although it is dependent on Ubuntu as the Linux distribution [JUJ9], it can work on plain non-customized
variants, thus it could work even for those providers that do not allow customized images. As expected the
operator is also able to specify the exact VM type, and if the provider supports it, the image or availability zone,
either for the entire environment or a particular service. [JUJ8]
On the downside, Juju does not automatically destroy the VMs as they are kept to allow the operator to inspect
their state. However, Juju is able to deploy a different type of unit on an available VM.
As previously stated we could see Juju as a very thin PaaS, thus we could use it to deploy the more conservative
applications, that have more elaborate requirements, for which even a PaaS like Cloud Foundry would be too
limiting.
The strong point for Juju in our case, is that it provides a very flexible and dynamic service reconfiguration and
interaction. Thus it would enable us to more easily implement the dynamic scalability and reconfiguration
features.
Even if we do not intend to reuse Juju due to its close ties with EC2-compatible providers and Ubuntu, we could
re-use their charms, especially the hooks, because they already implement all the needed functionality to
configure the managed service.
2.2.2.2. Cloudify
Overview
Coming from GigaSpaces, Cloudify is another open-source project, that fits into the same category as Ubuntu's
Juju, handling application deployment and execution on-top of cloud or virtualized infrastructures. Although it
fits into the same category as Juju, it sets itself apart by having explicit support for managing multiple
applications, it works on more cloud infrastructures, and provides basic monitoring and scalability features. The
concepts are the following [CDF1]:
application --- Defined as a set of services, that by working together provide the required functionality.
As previously stated, Cloudify is able to explicitly manage multiple distinct applications within the
same infrastructure.
service --- Just as with Juju, it is a logical construct that contains all service instances which provide the
same functionality. For example a database service might be a cluster of VMs that run the same
distributed database management system, or the web-frontend that is composed of multiple VMs
sharing the load.
service instance --- An individual VM that runs the required software for the respective service type.
application recipe --- It prescribes global characteristics, such as configuration, provisioning, or
scalability rules, related to the application as a whole, and lists the set of services that compose it,
together with their dependencies.
service recipe --- Prescribes the characteristics of an individual service layer, again including details
such as configuration or provisioning.
Characteristics
MODAClouds

th
2013 27

scalability manual and automatic scalability [CDF2]
interaction WUI, CLI, WS
hosting type deployable (open-sourced), simulated
services web-servers, databases, middleware
monitoring level container, application [CDF9]
monitoring coverage basic [CDF9]
multi-tenancy single organization [CDF6]
resource sharing 1:1 (n:1 on the roadmap) [CDF5]
providers EC2, OpenStack, Azure, manual [CDF3] [CDF10]
Notes and limitations
As said in the overview, Cloudify fits into the same category as Juju, therefore most observations made about
Juju are also true for Cloudify, thus we'll just summarize them:
the focus is on the operational aspects and not on the development ones;
it does allow reusage of recipes between multiple applications, and customization through recipe
parameters; (but see below for the drawbacks;) [CDF4]
it maps each service instance on an individual VM (thus a 1:1 mapping); (although there is the intention
to implement a more complex sharing mechanism;) [CDF5]
it requires the same bootstrapping through a management VM; [CDF3]
On the other hand, it also sets itself apart from Juju in the following respects:
although it seems to be only slighter older than Juju (2011 vs 2012), the developers of Cloudify state
that it's production ready; (this however might not correctly reflect the maturity and stability of the
solution;)
interacts with a few more IaaS providers, including manually provisioning of VMs, plus offering the
opportunity to add other providers by implementing drivers for them; [CDF3] [CDF10]
it features a more complex security model, including roles and groups for the users and for applications;
[CDF6]
it has explicit support for running multiple applications managed through the same Cloudify instance;
it allows basic monitoring and scalability rules; [CDF2]
it is not tied to a particular operating system distribution; [CDF5]
However it has major drawbacks, especially in terms of service reconfigurability and interaction, as we will
describe in the following paragraphs.
There is not a global repository for recipes, and each application must provide within its file system hierarchy all
the needed recipes [CDF4]. On one hand this is good because we have in one place all the needed files to inspect
and re-deploy another application instance with the exact configuration, thus enabling deterministic
configurability. On the other hand it makes maintaining and updating the recipes quite difficult, requiring
manually overwriting the files.
Moreover we cannot commission new services at run-time that were not described in the application recipe,
which means that any application upgrade cannot be done in-place, because it requires a complete stop-start
cycle. (The documentation does not explicitly state this issue, but after a careful review of the available
documentation, available CLI commands and API operations, there does not seem to be support for such a
feature.)
MODAClouds

th
2013 28
The way in which services are able to influence each other, either to transmit configuration parameters (like
endpoints, credentials, etc.) or to trigger custom behaviours, is rather cumbersome. Information exchange is
achieved by attaching attributes at various scopes to various entities (the application, service layer, service
instance, etc.), but the way in which this reconfiguration is observed is either through continuous polling, or by
manually calling commands on the dependant services. [CDF7] [CDF8]
On the up-side the documentation does seem to touch all important aspects, and even touches the API or other
technical aspects.
Another important feature is that related with monitoring, as it provides out-of-the-box basic metrics at service
instance level (CPU, memory, paging activity, etc.) and support for JMX; plus the operator is able to write
customized scripts or to install plugins that provide other metrics. [CDF9]
On-top of all these there is also support for basic scalability inside the same service layer [CDF2]. However the
rules can use metrics belonging only to the current service layer, and cannot capture global behaviours that
involve multiple service layers.
There is also basic integration with Puppet, that allows, on each machine, to run a stand-alone variant of those
deployment solutions, thus enabling deployment of services described through Puppet recipes.
In MODAClouds we could leverage Cloudify to deploy the applications over IaaS, just as we have described in
the case of Juju. However it would provide less flexibility in terms of application reconfiguration, although it
does provide basic monitoring out-of-the box, and supports a few more cloud providers.

2.2.3. Deployment solutions
As stated in the previous two section introductions, some applications require various degrees of freedom, and in
this section we focus on those applications that require an IaaS based deployment. Thus we look at a few tools
that help automating the task of provisioning, deployment and management of VMs.

2.2.3.1. AWS Cloud Formation
Overview
AWS CloudFormation is a deployment service, offered free of charge by Amazon, that allows an operator to
automate the deployment of any complex application that relies on services or resources provided by Amazon.
The operator must define his stack --- an application, or class of applications with a common architecture and
requirements --- in terms of a template --- which can then be parameterized and instantiated multiple times ---
that describes all the required resources, such as VMs, buckets, databases, and other services. Then, once pushed
to the dedicated AWS CloudFormation service, the instantiation and initialization of the resources happens
asynchronously, the operator needing only to watch for updates. Moreover the operator is even able to change
the template after the deployment was done, and AWS mediates the adaptation of the provisioned stack to match
the new requirements [ACF1].
Characteristics
type application deployment
application domain any
application architecture any
hosting type hosted
portability locked
services almost all AWS services [ACF2]
providers Amazon
MODAClouds

th
2013 29

Notes and limitations
The CloudFormation service provided by Amazon is unique compared with other production-ready IaaS
providers, allowing the operator to offload the provisioning step, and eliminating the need for various cloud
deployment tools or libraries. The downside is its locked-in nature by being tailored to Amazon's infrastructure;
however the format is simple and generic enough that with some effort it can be implemented for other
providers, although then the operator will have to rely on his own implementation of the service.
Although it is focused on infrastructure provisioning, having only basic or limited support for application
deployment --- like installing and configuring certain packages --- it does seem to integrate well with Puppet,
thus obtaining an integrated solution [ACF3].
As with other AWS products, the documentation [ACF2] is quite thorough --- both from an operator's, but also
from a tool developer's or integrator's, point of view --- and comes with sufficient examples [ACF4].
However, note that the expressiveness power of the templates is limited --- both in terms of syntax, JSON, but
also in terms of semantic --- allowing only limited control, like single valued parameters or simple conditionals
to be used, thus more complex decisions have to be taken outside of this framework, and a compiled template
must be used instead (for example we can take a closer look at how the JSON syntax is twisted in the user-data
properties of a VM, in order to convey information about the provisioned resources.)
From a certain point of view, this approach of describing the application resources and delegating their
provisioning to a dedicated service is not new, as it is captured in other standards, like OVF; research projects,
like mOSAIC, Contrail; or application deployment systems, like Cloudify.
If every targeted cloud provider would have a similar service, then all the MODAClouds IaaS deployment tasks
could have been completely delegated. But unfortunately as only Amazon provides it, we can take it as an
inspiration source and design and prototype a similar technology, that allows us to describe all the deployment
steps. Then, depending on the particular provider in question, we could either translate it into native template
variants and delegate the task, or simulate a similar service ourselves.
2.2.3.2. Puppet
Overview
Puppet, developed by PuppetLabs is an Open Source configuration management solution [PUP1]. Puppet is
distributed both in an OpenSource edition and as a Commercial Product with additional features.

As stated in [PUP2], Puppet uses a declarative, model-based approach to IT automation:
Define the desired state of the managed resources. This is done using Puppets own Domain Specific
Language or the Ruby Language
Simulate the changes before applying
Realizing the model the required changes to meet the defined model, restoring any configuration
changes not conforming to the model
Report the actions performed for enforcing the model
Puppet provides [PUP4]:
System configuration: a declarative, domain specific, language for expressing the relations between
servers, the services they host and the parts that form the services.
Tools: clients and servers for distributing the system configuration
A way for realizing the configuration
Basically Puppet, can be summarized as an Backwards Chaining expert system. It provides a mixture of:
goals (in Puppet terminology: node definitions, classes)
rules (in Puppet: resource type definitions)
MODAClouds

th
2013 30
facts (information related to the managed resources)
Puppet tries to reach a specific node state (goal) by applying rules defined in the manifests, with respect to the
facts of the nodes.
It needs to be noted that Puppet is an idempotent system: it guarantees that rules can be applied multiple times
obtaining the same outcome.
Puppet can be run both as a centralized service managed by a puppet master or decentralized using only the
agent. The decentralized approach requires an external (user provided) method for distributing the manifests.
See [PUP5] for more information regarding these concepts:
manifest Is a file in the Puppet Language, contained by a module. A manifest generally provides a
single class definition, defined type or node definition.
module A module is an organized collection/package of classes, templates, files. A module is
generally specialized in performing a specific task (e.g., deploying an application)
class Is a collection of related resources which can be declared as a single unit. Generally a class
contains all information needed for configuring a specific service
resource Is a unit of configuration managed by Puppet. Examples of resources: files, services,
packages
exported resource specifies a desired state for a resource (unfortunately it does not manage the
resource on the target system, and publishes the resource for use by other nodes [PUP5].) This feature is
used for sharing information between nodes
node A node is a collection of classes, variables and resources which are applied to a node (identified
by FQDN or FQDN REGEX)
catalog Is a compilation of all the rules that should be applied to a node
fact An information regarding a node

Characteristics
application architecture generic
interaction CLI, REST API, Puppet Dashboard
hosting type deployable
portability most Linux distributions (requires Ruby), Solaris and Windows [10]
services any (as long as a module/manifest is available)
monitoring no direct application monitoring. It allow monitoring of puppet itself using
REST API, CLI, Dashboard
resource sharing n:1 any number of classes can be applied to a node
providers EC2. Other available in Puppet Enterprise
Some Linux distributions provide older Puppet versions that lack some features. An example of this is
the RHEL provided package (Puppet version 2.6).
exported resources require additional services (e.g., puppetdb) for storing node information. [PUP8]
Dependencies between resources deployed in different nodes can be expressed using the exported
resources features: resources that depend on resources exported by a different node are realized only
after the required resource is realized on the source node. [PUP6], [PUP8]
in some areas the open source version of puppet is lacking features. Examples for this are interfaces to
cloud providers [PUP1], [PUP9]
lack of ways for clearing the effects of applying the rules on a node. For example if we apply the role
of MySQL Server to a node we cannot revert (in a simple way) the operations performed by Puppet.
The methods for extending Puppet (eg. adding new facts) tend to be complex, requiring Ruby modules
for implementing the extensions. [PUP6]
MODAClouds

th
2013 31

A major limitation is the way catalog updates are performed: by default every 30 minutes, in a puppetd
based implementation. This can be worked around with additional messaging middleware, or by
triggering manual updates using the provided API.
As it provides flexible resource definition, resource management (including reconfiguration) and inventory,
Puppet can be used for deploying and controlling applications.
Due to its idempotency Puppet is ideal for application deployment and resource configuration, providing a
predictable configuration based on a defined model.
2.2.4. FP7 projects
In this section the approach implemented by related EU projects will be overviewed.

2.2.4.1. mOSAIC
Overview
Overall mOSAIC is an FP7 research project that touches multiple subjects revolving around cloud computing,
from cloud brokering, interoperability up-to deployment and execution. For the purpose of the MODAClouds
project we shall focus mainly on the PaaS-related outcomes from mOSAIC.
The main concepts, detailed in [MOS1] or [MOS2], are:
component Represents the basic building block of a cloud application, the atomic deployment and
execution unit, which is materialized as one, or a set of tightly coupled, OS processes that run in an
isolated environment. There are many types of components, each type mapping to an application tier,
but they are treated the same by the platform. In general they fit in one of the following categories:
the "user" components --- which embodies the code developed by the user, and implements the
needed logic;
resource or middleware components --- that provide usual generic services, like data storage
(Riak, CouchDB, MySQL), message brokering (like RabbitMQ), etc.;
specialized components --- that are of particular use in the mOSAIC platform or in a cloud
environment, like the HTTP Gateway serving as a load-balancer, or the credentials service that
could hold and mediate access to sensitive information, like the cloud provider access keys;
controller The orchestration service that initiates the deployment and controls the execution of the
components. [MOS3]
hub A bus-like or RPC-like system that allows components to discover each other, or exchange
configuration messages. [MOS4]
Characteristics
type PaaS
suitability prototype
application domain web applications, generic applications
programming languages any
programming frameworks any
scalability manual
session affinity dependent on the load-balancer
interaction WUI, WS
hosting type deployable (open-source)
portability portable
services RabbitMQ, Riak, CouchDB, MySQL, custom
monitoring none
resource providers Amazon EC2, Eucalyptus, custom
MODAClouds

th
2013 32
resource sharing n:1
Related with limitations, mOSAIC imposes little constraints on the running components. For example any
component is able to listen on ports --- provided that it request access beforehand --- or receive inbound requests
from the Internet, not necessarily limited to the HTTP protocol. Moreover the resources allocated to a particular
component are configured by the operator, and could range up to the entire VM resources. The only real
limitations are that the component must run on Linux and not require root access.
Thus the support for programming languages and frameworks is virtually any that runs on Linux, the only
customization needed is the interaction with the component controller and component hub to gain access to their
services. On-top of that the components have access to native OS libraries and tools, either provided by the
current used distribution, or prepared by the developer before deploying the application.
Although from a certain perspective it fits the same functionality as other PaaS solutions, especially Heroku or
Cloud Foundry, it differentiates in that one instance of the mOSAIC platform is dedicated to only one instance of
a particular application, thus proving a good solution for private PaaS situations.
However unlike the aforementioned PaaS solutions, although mOSAIC usually shields the operator from the
low-level details such as VMs, it does offer full access when needed, thus the operator is able to pin-point the
VM where a certain component should be deployed.
Finally the software artifacts related with the PaaS aspects of mOSAIC are fully open source [MOS5], and are
split into many quasi-independent software services that could be reused even independently of the PaaS.
Because the many features of mOSAIC maps over the requirements of MODAClouds --- such as no multi-
tenancy, minimal restrictions over run application, or fine grained control --- it could fit two different usage
scenarios, both related with IaaS deployments:
for once it can be used to deploy and execute the user's application when targeting an IaaS solution;
and secondly, it could be used to deploy and execute the MODAClouds support services;

2.2.4.2. Cloud4SOA
Overview
The Cloud4SOA project helps to empower a multi-cloud paradigm at Platform as a Service (PaaS) level,
providing an open semantic interoperable framework for PaaS developers and providers, capitalizing on Service
Oriented Architecture (SOA), lightweight semantics and user-centric design and development principles. The
system supports Cloud-based application developers with multiplatform matchmaking, management, monitoring
and migration by interconnecting heterogeneous PaaS offerings across different providers that share the same
technology.
The Cloud4SOA project introduces a broker-based reference architecture consisted of five layers:
The Front-end layer supports the user-centric focus of Cloud4SOA and the easy access of both Cloud-
based application developers and Cloud PaaS providers to the Cloud4SOAs functionalities exposed via
widgetized services which are adaptable to the users context.
The Semantic layer is the backbone of the architecture that puts in place the Cloud semantic
interoperability framework (CSIF) and facilitates the formal representation of information (i.e. PaaS
offerings, applications and user profiles). It spans the entire architecture resolving interoperability
conflicts and providing a common basis for publishing and searching different PaaS offerings.
The SOA layer implements the core functionalities offered by the Cloud4SOA system such as PaaS
offering discovery and recommendation (matchmaking), PaaS offering and application publication
(profile management), application deployment, monitoring and migration.
The Governance layer implements the business-centric focus of Cloud4SOA where Cloud PaaS
providers and consumers (Cloud-based application developers) can establish business relationships
through Service Level Agreements (SLA). Specifically, it enables the lifecycle execution and
MODAClouds

th
2013 33

management of Cloud-based applications taking into account monitoring information, SLAs and
scalability issues.
The Repository layer acts as an intermediary between the Cloud4SOA system and the various PaaS
offerings allowing the applications to be independent from specific PaaS offering implementations.
Moreover, it provides a harmonized API that enables the seamless interconnection and management of
applications across different Cloud PaaS offerings, using PaaS-specific adapters deployed in each
platform.

Cloud4SOA provides four core capabilities implemented by the reference architecture:
Matchmaking. The matchmaking component allows searching among the existing PaaS offerings for
those that best match the developers needs. To succeed in this, the matchmaking algorithm heavily
capitalizes on the Semantic layer and, especially, on the PaaS and Application models while it
distinguishes the users needs into application requirements and user preferences. The degree of relation
is computed based on the similarity of the semantic descriptions between PaaS offerings and an
application profile taking also into account the target users preferences. In addition, the matchmaking
algorithm is designed to resolve the semantic conflicts between diverse PaaS offerings and to allow
matching of concepts between different PaaS providers that may use different naming or even different
measurement units. The outcome of the matchmaking algorithm is a list of PaaS offerings that satisfy
users needs, ranked according to the number of satisfied user preferences.
Management. The management component supports the efficient deployment and governance of
applications on a specific PaaS offering chosen by the application developer. The module performs an
analysis of the application requirements to build a specific application deployment descriptor. This
descriptor is created according to the format defined by the PaaS offering that the user has selected. It
then checks if a valid SLA contract has been previously agreed between the specific PaaS offering and
the application, finally initiating the deployment process using the Cloud4SOA standard API exposed
by every Cloud4SOA PaaS platform adapter. Moreover, this component provides a functionality to
manage the life-cycle of the application by delegating its lower level functionality to the Governance
Layer.
Monitoring. In a multi-cloud scenario, it is important to continually monitor business-critical
applications hosted on a variety of Cloud providers to ensure that their performance consistently meets
expectations and that Cloud resources are being effectively used. Cloud providers typically present very
diverse architectures providing dissimilar resource-level metrics used to provide fine grain Quality of
Service (QoS) guarantees. As a consequence, Cloud users are not able to compare offerings they are
adopting. In order to consider the heterogeneity of different Cloud architectures, Cloud4SOA provides a
PaaS monitoring functionality based on unified platform-independent metrics, such as latency and
application status, to allow application developers to proactively monitor the health and performance of
business-critical applications hosted on multiple Clouds environments.
Migration. Moving an application between PaaS offerings is a difficult operation, where several issues
could arise related to the different modelling and notation of the same features across different
providers. The Cloud4SOA framework aims to support a seamless migration to tackle semantic
interoperability conflicts. Moving an application between PaaS offerings consists of two main steps: i)
moving the application data and ii) moving the application itself. During the first step, all the
application data is retrieved from the PaaS offering where the app is running and moved to the new
PaaS offering; in order to avoid dirty or inconsistent states, the application is stopped before starting to
move the data. After the data structures have been created and initialized, the application code is
deployed on the new PaaS provider.
Cloud4SOA includes various components developed with a modular approach based on SOA architecture and
standard technologies and exposed through public REST APIs and REST interfaces. This approach simplifies
the integration with MODAClouds, depending on the core capabilities that the MODAClouds solution wants to
integrate.

MODAClouds

th
2013 34
2.2.4.3. OPTIMIS
Overview
OPTIMIS aims at optimizing IaaS cloud services by producing an architectural framework and a development
toolkit. The optimization covers the full cloud service lifecycle (service construction, cloud deployment and
operation). Optimis gives service providers the capability to easily orchestrate cloud services from scratch, run
legacy apps on the cloud and make intelligent deployment decisions based on their preference regarding trust,
risk, eco efficiency and cost (TREC). It supports end to end security and compliance with data protection and
green legislation. It also gives service providers the choice of developing once and deploying services across all
types of cloud environments private, hybrid, federated or multi-clouds.
OPTIMIS simplifies the management of infrastructures by automating most processes while retaining control
over the decision-making. The various management features of the OPTIMIS toolkit make infrastructures
adaptable, reliable and scalable. These, altogether, lead to an efficient and optimized use of resources.
By using the OPTIMIS toolkit, organizations can easily provision on multi-cloud and federated cloud
infrastructures and allows them to optimize the use of resources from multiple providers in a transparent,
interoperable, and architecture-independent fashion.
The unique OPTIMIS TREC solution involves choosing an optimal target platform based on trust, risk, eco and
cost data. This dynamic data that is extracted from target platforms is used for the benefit of service providers
and end users and can be weighted to fit their needs, ensuring an automated runtime solution. This can be used
at runtime to dynamically manage the optimal platform for a service run, providing cross-platform scalability,
using platforms with different hypervisors and cloud software, the OPTIMIS tools provide a common approach
and methodology to achieve this. This solution has been developed to operate within a fully managed, SLA
based OPTIMIS system with the legalities of handling data being analysed at every stage of operation as well as
the related business models being published. The MODAClouds decision engine could use this data when
deciding on the optimal target platform to run an application.

2.2.5. Infrastructures
Although MODAClouds has to support applications deployed both on PaaS and on IaaS solutions, the main
focus is on PaaS-based deployments, especially for those applications perfectly matching the PaaS constraints.
However the IaaS deployment is still an important topic, because even though the application targets a PaaS
deployment, there are cases where the PaaS itself is self-hosted on a deployable IaaS.
Moreover regardless on how the application is deployed and run, there are a few MODAClouds provided
services that have to exist alongside the application, and due to the requirements of those support services, they
must be deployed on VMs.
Thus there is a three-fold interest on IaaS technologies that we must be aware of:
for the deployment of the application components;
for the deployment of the hosted PaaS, on top of which the application runs;
for the deployment of the MODAClouds support services;
Therefore in the current section we seek to focus our survey on those IaaS solutions that are the most promising
in the contexts described above.
Because most providers improve their offerings by adding new advanced features, or new VM types with
various capacities, we will not focus on these aspects, but on the following ones, highlighted in Table 2.2.a,
which heavily impact the deployment procedures and execution:
hypervizor --- although the choice of hypervisor, does not directly influence the deployment, it does
influence the execution performance, and the kernel capabilities and requirements;
custom images -- some providers do not offer support for uploading customized images, and thus the
deployment process must compensate for this lack of functionality;
user data -- although a seemingly unimportant feature, it provides a way to convey some minimal
configuration information from the provisioning to the execution; again the lack of this feature must be
compensated by other means;
MODAClouds

th
2013 35

API -- how to interact with the VM provisioning process;

Provider Hosting type Hypervizor Custom
images
User
data
API Link
Amazon EC2 public, hosted Xen yes yes EC2 https://aws.amazon.com/ec2/
Windows Azure public, hosted unspecified yes no custom http://www.windowsazure.com/
Rackspace public, hosted Xen no yes custom http://www.rackspace.com/
Eucalyptus deployable,
open-source
Xen, KVM,
VMware
yes yes EC2-
compatible
http://www.eucalyptus.com/
OpenStack deployable,
open-source
Xen, KVM,
VMware,
plugable
yes yes custom,
EC2-
compatible
http://www.openstack.org/
OpenNebula deployable,
open-source
Xen, KVM,
VMware,
plugable
yes no custom,
EC2-
compatible
http://opennebula.org/
Flexiant deployable,
public, hosted
Xen, KVM,
VMware,
Hyper-V
yes yes custom http://www.flexiant.com/
GoGrid public, hosted Xen no no custom http://www.gogrid.com/
Joyent public, hosted Solaris
Zones, KVM
no no custom http://joyent.com/products/joyent-
cloud
Table 2.2.a IaaS characteristics

2.3. Resource allocation, load-balancing
The ability to operate a large number of resources is central to the performance of a cloud platform and to
achieve high QoS levels. However, how to distribute a large number of incoming heterogeneous requests among
a large number of heterogeneous VMs is a challenging problem in the context of IaaS platforms. Furthermore, in
MODAClouds we aim at extending the QoS management capabilities of an existing deployment and execution
runtime environment, mOSAIC, with new load balancing mechanisms. To support this activity, in this section
we describe the general problem of load balancing and review some of the scheduling algorithms available for
this problem. We then provide an overview of how this problem is approached in production systems.

2.3.1. Load balancing and scheduling algorithms
We consider the problem of a centralized dispatcher that receives requests for resources from a set of users.
These requests are heterogeneous in the amount and the type of resources they require, as well as in the priority
with which they are handled. By means of a scheduling algorithm, the dispatcher decides to which of a set of
servers a newly incoming request should be assigned. To this end, the scheduling algorithm may require more or
less information on the state of the servers and the characteristics of the requests. While more information may
improve the behaviour of the system (better QoS, less cost), it also demands more resources and thus a larger
overhead. We now describe a set of load balancing scheduling algorithms, considering their requirements and
objectives.

Random: probably the simplest scheduling policy, a server is selected randomly without additional
considerations.
Round-robin (RR): in this case the requests are assigned to the servers in a cyclic fashion, such that a
particular server out of N will receive one request for every N requests received by the dispatcher. This
policy is commonly used thanks to its low overhead and its ability to distribute the requests uniformly
among the servers.
Weighted round-robin (WRR): this policy generalizes the round-robin algorithm by assigning
different weights to the servers, thus allowing servers with a larger weight to receive a proportionally
larger number of requests. The weights can be set from information such as the server resources (e.g.
MODAClouds

th
2013 36
CPU clock speed), or its current load. If the weight changes dynamically (as with the assigned load) this
policy imposes a larger overhead.
Join-the-shortest-queue (JSQ): as its name indicates, under this policy the incoming requests are
assigned to the server with the least assigned workload. The need to keep track of the servers current
load implies a larger overhead than the simple round-robin.
Shortest-remaining-processing-time (SRPT): similar to the previous policy, but in this case the
remaining processing time before completion is used instead of the queue length, as the criteria to
determine the server in charge of the next incoming request.
Content-aware: this type of policies allocates a request by considering information regarding the
requested file sizes, location, and other characteristics. Among these, locality-aware policies aim at
avoiding the unnecessary replication of content among many servers by clustering the servers according
to the content they maintain, and routing the requests accordingly [Che00a] [Ris02].
Cycle stealing: this category of policies starts from a clustering of the servers, e.g. among those that
serve short jobs and those that serve long jobs. The policy then permits a request to be assigned to a
different cluster that the one it should (e.g., a short job assigned to the long job cluster) when a server
is idle. This policy aims to take advantage of these idle periods in one cluster to serve jobs from another
cluster, thus reducing the number of idling resources [Har03].

Many other policies have been introduced in the recent years in order to consider many factors that arise in a
cloud environment. Load balancing policies are surveyed in Section 4.4.4.

2.3.2. Load balancing in production systems
In this section, we describe how load balancing is achieved on IaaS platforms and traditional non-virtualized
applications by considering some of the different deployment solutions reviewed in Section 2.2. In a production
system other considerations have to be taken into account in addition to the ones mentioned in the previous
section. One of special interest is session persistence or stickiness. While for performance purposes it might be
preferable to allocate a set of requests to different servers, all these requests may belong to a single user session,
and all of them may need to be processed by the same server in order to successfully complete the requested
service. For instance, the user may first provide some credentials to validate its access, and then it may select a
set of products before proceeding to checkout. Once the first request is assigned to a given server, the remaining
ones should be processed by the same server, which keeps all the information for that user session. The ability to
handle this is called session persistence, session affinity or stickiness, and becomes relevant for the operation of
the load balancer. In the following we describe how load balancing is handled in production systems, where
issues such as session persistence must be taken care of.

Linux Virtual Server (LVS): The LVS load balancer can run IP-level load balancing (IPVS) or
application-level load balancing (KTCPVS) implemented in the Linux kernel. Under IPVS every server
must provide the same services and the selection of the servers is done by the scheduling algorithm,
which can be selected among several supported. LPVS supports
o Round robin
o Weighted round robin, with weights assigned according to the servers capacity,
o Shortest expected delay (shortest remaining processing time)
o Least-connections scheduling (LC) (similar to shortest queue), where the request is assigned to
the server with the least number of current connections. It also supports a weighted version of
LC scheduling (WLC)
o Locality-based least-connections scheduling (LLC), where each server has a number of
destination IP addresses assigned, and a request for an address is allocated to the server with
that address, unless it is overloaded (the least loaded server is chosen in this case)
o Locality-based least-connections scheduling with replications (LLCR), where the destination
IP address is mapped to a set of servers, and a request for that address is assigned to the server
with the smallest number of connections from this set.
Session affinity is handled by means of persistent ports, such that when a persistent port is used, a
connection template is created for the client-server pair and a entry is added to the template for the
current connection. In this manner the traffic related to this connection will be handled by the same
server.

mOSAIC: mOSAIC is a deployment and execution environment that allows the developer to run any
solution for load balancing. However, it provides the mOSAIC HTPP Gateway, which listens for HTTP
MODAClouds

th
2013 37

connections and pushes them to a RabbitMQ broker. This broker distributes the messages to the
consumers according to a round-robin strategy, provided that all consumers pull messages at the same
rate. In case a consumer pulls messages at a faster rate, the broker will dynamically adjust the rate.

Heroku : In Heroku, load balancing is performed automatically by means of round-robin. Further, if the
application is scaled up or down, the nodes are (de-)registered with the routing infrastructure
automatically. However, Heroku does not support session affinity.

Cloud Foundry: Cloud Foundry handles load balancing by means of Apache web server. It supports
three scheduling algorithms: request counting, weighted traffic counting, and pending request counting.
The first two can be considered as weighted round robin approaches, where the weights are assigned
according to the capacity of the server. The difference between them is that in the first method the
capacity is measured in terms of requests, while in the second in terms of traffic (I/O operations). The
last method is equivalent to join-the-shortest-queue. Cloud Foundry supports session affinity by means
of cookies, that is, assigning a cookie to a new connection in order to recognize it in the future and route
it to the proper server.

Amazon Elastic Load Balancing (ELB): Amazon ELB supports load balancing by enabling Round-
Robin method. It also provides sticky session by utilising cookies to indicate the destination server.

Rackspace Cloud Load Balancer: Currently Rackspace supports the following load balancing
algorithms:
o Random
o Round-Robin
o Weighted Round-Robin
o Least Connections
o Weighted Least Connections
Rackspace provides the function of session persistence (sticky session). Rackspace has two modes for
this. One is to insert HTTP cookie in the message. The other is to keep track the source IP address and
determine the corresponding server.

Puppet: Puppet proposes four ways to distribute agent load.
o Manually distribute the load to even the load among each server.
o Use DNS Round-Robin method to distribute requests to different servers according to the DNS
names (multiple IP addresses are associated with a single domain name) in a round-robin
manner.
o Use a load balancer to distribute the load.
o Use DNS SRV records to achieve load balancing. Each node will be allocated a SRV domain
name. Then the SRV record will go to the puppet master with corresponding domain name.
This requires a DNS service supporting SRV records.

Google App Engine (GAE): Google App Engine uses DNS Round Robin load balancing algorithm.
Currently it does not provide sticky sessions, only replicated sessions are supported.

Windows Azure: Windows Azure will provide round-robin load balancing of network traffic to
publicly defined ports of a cloud service. Windows Azure, unfortunately, does not support load
balancing and it does not provide Sticky Sessions. However, Windows Azure provides the functions for
Eclipse Sticky Sessions.

GoGrid: GoGrid supports the following load balancing algorithms:
o Weighted Round Robin
o Weighted Least Connect
o Source Address Hashing
GoGrid also provides sticky session. It supports three modes related to sticky session.
o None: None is the default option and no sticky session functionality will be utilised.
o Session Cookie: Use cookie to achieve sticky session.
MODAClouds

th
2013 38
o IP Subnet: Requests from the same /24 IP subnet will be allocated to the same server for a
limited period of time.
Haproxy: Haproxy supports the following load balancing algorithms:
o Weighted Round-Robin
o Static Weighted Round-Robin
Static means that the weight will not change on the fly
o Least Connections
o First Available (FA)
o Source (SRC)
The server receiving the request is chosen from the value of hashed source IP address divided
by the total weight of the running servers.
o URI
The URI is hashed and the hash value is divided by the total weight of the running servers to
obtain the server that will receive the request.
Haproxy supports sticky session by using cookie.
Crossroads (XR): Crossroads supports:
o Round-Robin
o Least Connection
o First Available
Crossroads provides sticky session by inserting custom headers in HTTP messages.

Piranha
Piranha, also known as IP Load Balancing, is based on LVS and provides load balance among a cluster
of servers. It has two main features:
o Heartbeating between active and backup load balancers.
o Checking availability of the services on each of real servers.
Table 2.4.a summarizes the load-balancing characteristics of the deployment solutions reviewed. The scheduling
algorithms are referred to with the acronym introduced in their definition, and a legend is supplied below for
easy reference. Further, the last column shows if the deployment solution supports session affinity.

Table 2.4.a Summary of Load Balancing Methods
R RR WRR SWRR LC WLC LLC LLCR FA SRC URI SRV Aff
LVS x x x x x x x
mOSAIC x
Cloud Foundry x x
Heroku x
ELB x x
Rack
space
x x x x x x
Puppet x x
GAE x
Azure x
GoGrid x x x x
Haproxy x x x x x x x
XR x x x x
MODAClouds

th
2013 39

Piranha x x x x x x x

*R: random, RR: round robin, WRR: weighted round robin, SWRR: static weighted round robin
LC: least connections, WLC weighted least connections, LLC: locality-based least connections,
LLCR: locality-based least connections with replications,
FA: first available, SRC: source based

2.4. Application data management and migration
Data migration is the process of transferring data between storage types, formats, or computer systems [Wik13].
This process encompasses issues concerning mapping the schema of the source storage to the destination one
and transforming the data from the original format to the target one (these two issues are being handled in
deliverable D4.1 and therefore are not further discussed here). In many cases data migration is performed off-
line with respect to the normal operation of the applications using such data.
In some cases, however, such an operation has to happen online within guaranteed time constraints. In [Lu02]
authors propose Aqueduct an approach to support online migration that uses a control-theoretical approach to
statistically guarantee that data are transferred in the shortest possible time without significantly impacting on the
performance of the application being executed in foreground. In [Kar11] authors study the problem of migrating
data on various disks by assuming that each data node can handle simultaneously more than one data transfer at a
time. Authors demonstrate that the problem of minimizing the number of rounds needed for transferring data is
NP/hard but they also propose an efficient algorithm that still offers very good guarantees.
In the context of MODAClouds we plan to develop a data migration approach based on reliable streams of data.
This allows us to handle the case of highly dynamic data that need to be kept fully synchronized through
different replica.
In many real-world applications data take the form of continuous streams instead of the form of finite data sets
stored in a traditional repository. This is the case of monitoring network traffic, telecommunications
management, clickstreams, manufacturing, sensor networks, and many other domains. In such applications,
instead of classical one-shot queries, clients need to register continuously running queries, which return new
results as new data arrive on the streams. A data stream is a sequence of items received continuously and in
real-time, ordered either implicitly, by arrival time, or explicitly, by means of timestamps. Not only is it typically
impossible to control the order in which items arrive, but, even more important, it is not feasible to locally store
a stream in its entirety [Gol03]. Due to all these peculiar characteristics, traditional database systems and data
processing algorithms are not suitable for handling numerous and complex continuous queries over data streams.
Ad-hoc data management systems have been studied and developed since the late nineties.
One of the first proposed models for data streams was the Chronicle data model [Jag95]. It introduced the
concept of chronicles as append-only ordered sequences of tuples, together with a restricted view definition
language and an algebra that operates over chronicles as well as over traditional relations. OpenCQ [Liu99] and
NiagaraCQ [Che00] addressed continuous queries for monitoring persistent data sets spread over wide-area
networks. Another data stream management system is Aurora [Bal04], which in turn evolved into the Borealis
project [Aba05], which addresses the distribution issues.
In [Bab01], Babu et al. tackle the problem of continuous queries over data streams addressing semantic issues as
well as efficiency concerns. They specify a general and flexible architecture for query processing in the
presence of data streams. This research evolved into the specification and development of a query language
tailored for data streams, named CQL [Ara03, Ara06]. Further optimizations are discussed in [Mun07].
A different perspective on the same issue brought Law et al. [Law04] to put particular emphasis on the problem
of mining data streams [Law05]. Another DSMS is Stream Mill [Bai06], which extensively considered and
addressed data mining issues, specifically with respect to the problem of online data aggregation and to the
distinguishing notion of blocking and non-blocking operators. Its query language (ESL) efficiently supports
physical and logical windows (with optional slides and tumbles) on both built-in aggregates and user-defined
aggregates. The constructs introduced in ESL extend the power and generality of Data Stream Management
Systems.
MODAClouds

th
2013 40
The problem of processing delays is one of the most critical issues and at the same time a strong quality
requirement for many data stream applications, since the value of query results decreases dramatically over time
as the delays sum up. In [Che06], the authors address the problem of keeping delays below a desired threshold in
situations of overload, which are common in data stream systems. The framework described in the paper is built
on top of the Borealis platform.
As for the join over data streams, rewriting techniques are proposed in [Gol08] for streaming aggregation
queries, studying the conditions under which joins can be optimized and providing error bounds for results of the
rewritten queries. The basis for the optimization is a theory in which constraints over data streams can be
formulated and the result error bounds are specified as functions of the boundary effects incurred during query
rewriting.
More recently research focused on exploiting parallelism in stream processing and, more recently, performing
in an elastic way (i.e., the ability to scale out, when the rate of data elements in the stream increases, and scale in,
when the rate decreases). Apache S4 [Neu10] and Twitter Storm [Twi13] allow stating queries as directed
acyclic graphs with parallel operators. S4 instantiates parallel computations of operators but does not allow to
control neither the parallelism nor the state. Storm users can assert a parallelization level and offers a way to
partition a stream based on key intervals. Still, it does not offer facilities to manage the operator state and it does
not support elasticity. Stateful queries parallelization at run-time is supported in StreamCloud [Gul12]. It uses a
query compiler to transform high-level queries into a graph of relational algebra operators to be executed in
parallel thanks to hash-based parallelization. Some proposals for scalable stream processing systems (e.g.,
[Mar11]) adapt the map/reduce paradigm to low-latency stream processing, but they allow expressing only very
simple continuous queries. [Fer13] is one of the most promising approaches. It exposes internal operator state
explicitly through a set of state management primitives. Based on them, it describes an integrated approach for
dynamic scale out and recovery of stateful operators. Externalized operator state is check-pointed periodically
and backed up to upstream VMs. The system then identifies individual operator bottlenecks and automatically
scales them out by allocating new VMs and partitioning the check-pointed state from upstream VMs. At any
point, failed operators are recovered by restoring check-pointed state on a new VM and replaying unprocessed
tuples, not yet reflected in the state.

MODAClouds

th
2013 41

3. State-of-the-Art: Cloud Monitoring
3.1. Preamble
In the context of Cloud management, monitoring Cloud infrastructures and applications is a critical task. It
allows us to i) get insights into the system and the health of the running applications, ii) make capacity planning
with the aim of scalability and, as a result, coming up with adaptation decisions based on monitoring data. In
general, monitoring approaches differ from various points of view, in particular:

The monitoring actor (who) can be the application/service provider, cloud provider, or third party. Who
performs monitoring has an impact on the aspects possible to monitor. For instance, an application/service
provider has usually a full control over the execution of its application components/services and can easily
probe their internals. In contrast, it could not easily access to infrastructure level information that the cloud
provider may hide.
What objects and properties are monitored. The monitored objects can be an application, cloud
resources or some specific services such as queues. As for the monitored properties different approaches
can be distinguished one from the other for the following two aspects:
Types of monitored properties. Monitored properties can be functional (proper functionality) or non-
functional (quality aspects) such as execution cost, response time, and throughput.
Punctual vs History-based monitoring. The monitoring is punctual if it concentrates on values
collected at particular instants of the execution. The term history-based monitoring refers to the case
when the analysis considers the history of the system in a certain time window in order to discover the
presence (absence) of sequences of values or events.
When monitoring happens. This concerns the timing of the monitoring process with respect to the
execution of the monitored system.
How the monitoring system is built. This refers to the monitoring mechanism, the expressiveness of the
language and abstraction level, the validation approach, the capability of diagnosing and deviation handling
and the runtime support.
The language expressiveness. The type of monitored properties and the capability of predicating on
single values and histories lead to the expressiveness of the monitoring language. After deciding what
we want to monitor, we need a way to render monitoring directives. Usually services are monitored
through special purpose constraints that must be checked during execution such as compliance with
promised SLAs. History-based constraints require a temporal logic to relate the values belonging to a
sequence, while monitoring the QoS properties imposes a language allowing for suitable representation.
Abstraction level. Monitoring properties can be expressed at various abstraction levels. There is a
distinction between the level at which monitoring works and the level at which the user is required to
define such properties. This distinction helps characterize what the user specifies with respect to what
the execution environment must cope with. Abstraction level refers to the first aspect and does not
consider what the runtime support executes. This aspect is taken into account when considering the
degree of automation intrinsic within each approach.
Architecture of the monitoring environment. Monitoring constraints must be specified and then
evaluated while the system executes. The support could be in the form of modelling environment,
meaning what the approach offers to the user to specify the monitoring constraints (e.g., Palladio
component modeling) and execution environment meaning what the approach offers/requires to check
directives at runtime. Usually, specification environments propose proprietary solutions, while
execution environments can be based on standard technology, proprietary solutions, or suitable
customizations of existing engines. The execution environment may also include the mechanisms
deployed for generating the monitoring information such as instrumentation, reflection, and the
interception of events/information exchanged in the execution environment.
Filtering. Pure monitoring is in charge of detecting possible discrepancies between what stated by
monitoring constraints and what is actually collected from the execution. Filtering and reasoning
abilities enable analysis of complex properties by combining raw data collected by the monitoring
infrastructure.
Monitoring output. The monitoring environment can offer its output through specific and proprietary
GUI or it can offer APIs enabling the retrieval of monitoring data. In general, the richness of
information offered as an outcome of the monitoring activity is of paramount importance.
MODAClouds

th
2013 42
Derivation of monitoring constraints. Monitoring directives can be either programmed explicitly or
be derived (semi) automatically from other artifacts, e.g., design specification containing QoS
information.
Reactive/proactive monitoring. Reactive monitoring takes actions to solve problems in response to
one or more incidents, after a problem has occurred. It is designed to analyse the direct and root causes
of the problems and then take corrective actions to fix them. Optionally, it could collect data for
comparison with past and future events and allow related risk assessment. For example, reactive
monitoring comprises reduction, correlation, sequencing, notification and reporting of event, automated
actions and responses, and the implementation of special-purpose policies to constrain problems.
Proactive monitoring implies the definition of monitoring actions trying to identify and solve problems
before occurrences such as the verification of SLAs, capacity planning and treatment of statistics to
measure of the system behaves.

3.2. General Monitoring Approaches

In this section some approaches generally applicable to web service monitoring are presented, together with
some cloud-specific ones that are dealing with all layers of the Cloud. These are useful as a starting point to get
ideas about aspects to be considered in building monitoring architectures.
3.2.1. The COMponent Performance Assurance Solutions (COMPAS) [Mos02]:
The COMPAS approach is worth to be considered as a framework encompassing a complete loop of
performance identification for component-oriented distributed systems. It is a performance monitoring approach
for J2EE systems in which components are EJBs. The framework is divided into three parts: monitoring,
modelling, and prediction. Java Management Extensions (JMX) is used in the monitoring part. An EJB
application is augmented with one proxy for each component EJB. Such proxies mirror in the monitoring system
each component in the original application. Timestamps for EJB life-cycle events are sent through these proxies
to a central dispatcher. The performance metrics of the application can be visualized with a graphical tool. UML
models can be generated with SPT annotations from the measured performance indices. During runtime, a
feedback loop connecting the monitoring and modelling modules, allows the monitoring to be refined by the
modelling process in order to increase the efficiency of monitoring/modelling. The COMPAS architecture is
depicted in Figure 3.2.a.

Figure 3.2.a. The COMPAS Architecture
3.2.2. TestEJB [Mey04]:
The TestEJB approach deals with QoS specification of components by introducing a novel measurement
architecture. The target of the TestEJB framework, which is an extension to the JBoss application server, is the
performance monitoring of J2EE systems implementing an application-independent profiling technique. The
framework focuses on response time by monitoring execution times of EJBs and also traces of users calls.
Implemented interceptors log invocations of EJBs and augment calls with identifiers to construct call graphs
from execution traces. TestEJB uses bytecode instrumentation library to modify components at deployment time.
The approach relies on the Java Virtual Machine Profiler Interface (JVMPI) and records events for constructing
and deconstructing EJBs. This method allows tracing back memory consumption to individual components and
introduces a significant overhead in the range of seconds and should only be used in a testing environment. The
monitoring architecture is demonstrated in Figure 3.2.b.
MODAClouds

th
2013 43

Figure 3.2.b. TestEJB Monitoring Architecture

A couple of possible layers for instrumentation in order to capture response time and their pros and cons are
identified in this work as follows. The bottom layer is the operating system. The monitoring of the entire
network traffic is possible through instrumenting the kernel. Hence, the J2EE traffic could be filtered, while it is
not portable across operating systems and EJB containers. Above the OS there is J2EE layer, which is the Java
Virtual Machine. Using interfaces such as the JVM Debugger or the JVM Profiler, it is possible to measure time
information, while filtration must be applied. Further, instrumenting the java.net. classes is possible at this layer
[Cza98]. The J2EE (i.e., EJB container) is the next layer for instrumentation by augmenting the source-code of
the container, but it is not portable to the implementation of other containers. Instrumenting the EJBs themselves
is another possibility, but for response time from the server point of view not the client side. The most promising
approach which considers both client and server sides is the non-intrusive instrumentation of the EJB container
through callback like interceptors.

3.2.3. Performance Anti-pattern Detection (PAD) [Par08 ]:
Monitoring is not limited to just obtaining some raw data, rather it concerns also analyzing the data in order to
detect functionality and performance issues that then trigger ameliorative adaptation decisions. The objective of
PAD, which is based on the COMPAS framework, is the automatic detection of performance anti-patterns
[Par06] in EJB component-based systems. The framework includes three modules as performance monitoring,
reconstruction of a design model and anti-pattern detection.
Monitoring is performed at the component level and it is portable across different middleware implementations
as a result of using standard JEE mechanisms. There are proxies for each EJB in order to collect timestamps and
call sequences. Bytecode instrumentation is considered with the purpose of redeployment of a running system,
which dynamically instruments the application at runtime. Then after, reconstruction of the design model is
achieved by using measurements from EJB deployment descriptors, similar to the RMCM approach, capturing
the structural and behavioural information from an executing system [Par07]. Relying on rules implemented with
the JESS rule engine, anti-pattern detection on the reconstructed design model could be achieved (based on
predefined rules). Anti-patterns across or within runtime paths, inter-component relationship anti-patterns, anti-
patterns related to component communication patterns, and data tracking anti-patterns are all distinguishable.
This approach has advantages compared to some common. The approaches utilizing Java profilers by monitoring
at the JVM level collect information on any class loaded by the JVM. Hence the huge mass of loaded classes
makes it difficult to distinguish the code of interest from lower level middleware and library calls. Further, the
information is raw and it is not trivial to determine the cause of performance issues. On the contrary, PAD
collects data at the correct level of abstraction (e.g., component level for JEE systems) and provide sufficient
MODAClouds

th
2013 44
runtime context for the collected data (e.g., run-time paths [Par07], dynamic call traces [Jer97]). The framework
is demonstrated in Figure 3.2.c.

Figure 3.2.c. The PAD framework architecture.
3.2.4. An elastic Multi-layer monitoring approach [Kon12]:
A peer-to peer scalable distributed system for monitoring is proposed in [Kon12], enabling deployment of long-
living monitoring queries (query framework) across the cloud stack to collect metrics and trigger policies to
automate management (policy framework). The monitoring architecture of the approach is composed of three
layers as data, processing and distribution, interfacing on different levels with the cloud stack as demonstrated in
Figure 3.2.d.

MODAClouds

th
2013 45

Figure 3.2.d. Three-layered monitoring architecture.

The data layer provides extensible adaptors to cope with resource heterogeneity. The processing layer describes
complex queries (in a SQL-like syntax) over data and also defines policy rules to be triggered if needed. The
distribution layer performs an automated deployment of processing operators in the correct places relying on
services such as SmartFrog [Gol09], SLIM [Kir10] or Puppet [Tur07].
Three sequential steps of the monitoring framework are as follows.
1. Metadata definition, to define the available data sources.
2. Query & policy definition, to evaluate information and define reactions over certain conditions.
3. Distribution & execution, deployment of queries and policies using a placement strategy.

3.2.5. RMCM, A Runtime Model Based Monitoring Approach for Cloud [Sha10]:
The Runtime Model for Cloud Monitoring (RMCM) proposed here represents a running cloud through an
intuitive model. It organizes monitoring data, hiding the heterogeneity of underlying infrastructures and
platforms and presenting the system at a high level of abstraction. The objective is to provide an intuitive
operable profile of a running cloud and to apply it for implementing a flexible monitoring framework with
adaptive monitoring capabilities based on a trade-off between monitoring overhead and monitoring capabilities.
The model constantly mirrors the system state [Bla09]. The adaptive actions are taken against the model rather
than against the running cloud directly, so as to avoid low-level operation mistakes. There are three types of
roles in the cloud, cloud operators, service providers, and end users, and RMCMs are presented from the point of
view of these roles. Integration of these views provides a comprehensive runtime model for cloud monitoring as
demonstrated in Figure 3.2.e.

Figure 3.2.e. Overview of the RMCM approach.

Interaction behaviour, Application, Middleware and Infrastructure are categories of entities to be modelled.
The main focus of infrastructure monitoring is on the resource utilization. Applications are monitored from the
design and performance point of views.
The monitoring is based on server-agent architecture as demonstrated in Figure 3.2.f.

MODAClouds

th
2013 46
Figure 3.2.f. Server-agent architecture.

A monitoring agent is deployed on each virtual machine in charge of collecting runtime info of all the entities
on the same VM. These entities are equipped with various monitoring mechanisms to collect runtime info from
entities of each level. Collected info is used to instantiate corresponding RMCM to be checked based on some
pre-defined rules. Administrators can view and query this monitoring information from the DB. They also can
modify the monitoring configuration for each agent.
3.2.6. The Multi-layer Collection and Constraint Language (mlCCL) [Bar12]
mlCCL is an event-based multi-level service monitoring approach which defines runtime data to be collected and
how to collect, aggregate (for the purpose of building higher-level knowledge) and analyze (for the purpose of
discovering undesired behaviours) it in a multi-layered system such as Cloud. The approach is extensible and
could be used in a cross-layer manner. For instance, it is possible to monitor a platform as a service or the
hypervisor by adding new indicators. Further, ECoWare (Event Correlation Middleware) [Bar10], which is a
framework for event correlation and aggregation, initially developed for the monitoring of BPEL processes, is
extended to support mlCCL.
Data in mlCCL is described as Service Data Objects (SDOs) [Ope07], which is a language-agnostic data
structure used to facilitate the communication between diverse service-based entities. The SDOs to be collected
may have two kinds of data collections: messages and indicators. Messages are used to obtain the request or
response messages exchanged during service invocations. In the case of a new service invocation for which a
message collection is defined, the mlCCL tool produces a new SDO and outputs it to an event bus so that the
designer can make further use of it. In addition to collected messages and the location of them, an SDO contains
a timestamp indicating when the message was sent or received by the service runtime (NTP (Network Time
Protocol) timestamps are used for the purpose of clock synchronization when sampling sources on different
computers with a precision in the order of 10 ms) and an instanceID, that is a unique ID identifying the specific
service call. Indicators are not triggered by any particular service call. They collect periodic information about a
service. An indicator can be a Key Performance Indicator (KPI) such as average response time or throughput, or
a Resource Indicator (RI) such as the amount of available memory or idle CPU in a virtual machine. Upon
calculating a new indicator value by mlCCLs runtime, it is wrapped in an SDO and output to an event bus for
the purpose of further use.
The ECoWare framework encompasses three components as the execution environments for collecting run-
time data through appropriate probes, processors with the aim of data aggregations and analyses and the data
visualization dashboard. The aforementioned components collaborate through a Siena [Car01] Publish and
Subscribe (P/S) event bus.
3.2.7. Infrastructure monitoring in current IoT landscape
The domain of smart objects or IoT is characterized by an ever increasing number and diversity of smart objects,
plus a growing volume of produced data and events. The world of IoT is currently evolving from a collection of
simple sensors and actuators, controlled by rather rudimentary services, into a truly smart environment, with
interacting objects. While current research has achieved impressive results for different aspects of IoT
technologies, its visible that current solutions and research activities fail to address some of the key issues
required for and limiting the full deployment of a secure and reliable IoT world.
A smart object is essentially a building block for services and it should be treated as such. A fully fledged vision
of IoT amplifies requirements on openness and scalability, universal access and security, performance, and
accountability. All of these requirements need to be addressed in a systematic way, rather than by ad-hoc/tailored
solutions confined in specific provider/vendor silos. Commercially available IoT systems are often unable to
inter-operate, even when deployed in the same physical environment. Clearly, such limitations need to be
overcome in order to bridge IoT, services, people and business worlds. Several running FP7 projects, such as
IoT-A, have as their main objective to break this vertical structure of silos and allow all technologies to be used
by all domains.
Due to specific nature of low reliability environments for running IoT eco-systems different degrees of
autonomic capabilities are developed and which are intensively based on the monitoring of key parameters and
performance indicators. Due to the fact that current approach is to view things as service exposures, monitoring
of infrastructures serves in addition automatic steps such as service discovery and repair.
Key technology on this field is Complex Event Processing and by consequence all architectural approaches
follow event-driven principles.
A special case in IoT services where monitoring is very relevant is the one of self-healing, due to the fact that
most of scenarios asks for critical safe procedures in place, like the ones in power grids, for example.
MODAClouds

th
2013 47

Another aspect considered is the fact that due to highly distributed model, a centralized monitoring infrastructure
is an unacceptable single point of failure, in addition to the inherent delays and noise occurred in networked
environments.
Such systems make use of techniques like CEP previously mentioned, publish/subscribe strategies and goes until
full scale data fusion ones, last area being strongly connected with the so called Situation Awareness or Context
concepts.
Extended information gathering can be labelled as action information fusion because an assessment of
possible/plausible actions can be considered. The goal of any fusion system is to provide the user with a set of
refined information for functional action. Taken together, the user-fusion system is actually a functional sensor,
whereas the traditional fusion model is just an observational sensor. The user refinement block not only is the
determination of who wants the data and whether they can refine the information, but how they process the
information for knowledge. Bottom- up processing of information can take the form of physical sensed
quantities that support the higher-level functions. To include machine automation as well, it is important to
afford an interface with which the user can interact with the system (i.e. sensors). Specifically, the top-down user
can query necessary information to plan and decide while the automated system can work in the background to
update the changing environment. Finally, the functional fusion system is concerned with the entire situation
while a user is typically interested in only a subsection of the environment. It should be considered that with
user can be assimilated also a system capable to publish and enforce own policies over a certain infrastructure.

3.3. Infrastructure-Level Monitoring
Infrastructure-level monitoring involves collecting metrics related to CPU, memory, disk and network from a
IaaS platform, either via monitoring probes deployed inside VMs or using services provided by the cloud
platform itself (e.g., Amazon Cloudwatch). MODAClouds will consider both types of sources to acquire this
information. Therefore, in this section we review current standard monitoring tools and cloud-specific tools that
may be appropriate for this purpose. A specification of the monitoring system that will be adopted in
MODAClouds is available in deliverable D6.2.

3.3.1. Guest VM monitoring
Common tools for monitoring performance metrics inside VMs include collectl, sar, SIGAR. These tools are
meant to be used inside individual VMs to acquire performance. Thus, they are not meant to provide a solution
for monitoring of distributed systems, either applications or machine clusters.

collectl. Collect system information about CPU, memory, disk and network. These abilities are all provided by
plugins; therefore there are no external dependencies. However, most plugins are only for Linux. Collectl does
not generate graphs, but can use other programs to draw graphs. There is only a basic monitoring functionality.
Currently it is limited to simple notification and threshold control. Everything in collectd is done in plugins.
Collectd provides plugins to relate with other projects, such as to receive data from gmond from Ganglia project.
collectl is mainly used on Linux systems.

System Activity Reporter (sar). sar can monitor metrics related to overall system performance, such as cpu,
disk, IO and processor information. It can have for graphical presentation of data gathered by sar using sag
(system activity grapher). sar can monitor only Linux systems.

System Information Gatherer And Reporter (SIGAR). Sigar is a free software library that provides a cross-
platform, cross-language programming interface to low-level information on computer hardware and operating
system activity. The core API is implemented in C with bindings implemented for Java, Perl, Ruby, Python,
Erlang, PHP and C#. It gathers system information such as CPU, memory, network and file system (such as the
file system usage of a mounted directory). Sigar runs on Linux and Windows.

3.3.2. Host machine monitoring
esxtop. esxtop is used to analyse real-time performance data inside an VMware ESX or ESXi server esxtop is
able to collect CPU, interrupt, memory, network, disk adapter, disk interface, disk VM, and power management
information.
MODAClouds

th
2013 48

xentop. xentop is included in a Xenserver and displays real-time information about the server system. xentop is
able to collect CPU, network, memory and disk information.

3.3.3. Cloud Infrastructure-level monitoring
Ganglia. Ganglia is a scalable distributed monitoring system for high-performance computing systems such as
clusters and Grids. The ganglia monitoring daemon (gmond) collects dozens of system metrics: CPU, memory,
disk, network and process data. The Ganglia web frontend provides a view of the gathered information via real-
time dynamic web pages. Ganglia is able to run in Linux and Windows.
Nagios. Nagios offers the ability to monitor applications, services, operating systems, network protocols, system
metrics and infrastructure components with a single tool. Nagios is able to respond to issues at the first sign of a
problem and automatically fix problems when they are detected, such as restarting a failed service when some
predefined conditions are met. Nagios is able to run on Linux and Windows (using add-ons).
MonALISA. MonALISA is designed as an ensemble of autonomous multi-threaded, self-describing agent-based
subsystems which are registered as dynamic services, and are able to collaborate and cooperate in performing a
wide range of information gathering and processing tasks. MonALISA is able to collect local host information,
such as CPU, memory, network traffic, Disk I/O, processes and sockets in each state, LM sensors, APC UPSs.
MonALISA service is mainly used on Linux.

JASMINe. JASMINe Monitoring is a set of tools that allows supervising a distributed infrastructure. JASMINe
is able to collect information such as CPU usage, memory usage and JDBC connections. JASMINe has a web
console to visualize the monitoring data. JASMINe is able to run on any platforms.

Zabbix. Zabbix is an open source performance monitoring software for enterprise environment. Zabbix is able to
monitor performance of CPU, memory, network, disk space and processes and hardware information. Zabbix is
able to monitor network devices (routers and switches), databases, Java application servers and web service.
Zabbix is available for Linux, Unix and Windows and does not require a specific environment such as JAVA.
Zabbix has flexible network discovery functionalities based on IP ranges, availability of external services (FTP,
SSH, WEB, POP3, TCP, etc.), information from Zabbix agent and information from SNMP agents. These
functionalities could be considered in developing dynamicity for monitoring data collectors. Zabbix is used by
the BonFIRE infrastructure developed in FP7.

3.3.4. Cloud-specific monitoring

Amazon CloudWatch. CloudWatch enables monitoring for Amazon cloud resources and services. CloudWatch
provides monitoring metrics about CPU, disk, network, system status check and custom metrics such as memory
and error rates. CloudWatch supports alarm for notification of predefined metric value. Graphs and statistics are
provided for selected monitoring metrics. CloudWatch is neither open source nor free.

Cloudify. Cloudify is an open source PaaS for business-critical apps enabling on-boarding and scaling to any
cloud. Cloudify supports monitoring system using a web management console or using the cloudify shell.
Cloudify in default can monitoring CPU and memory. It also supports monitoring probes and plugins to conduct
monitoring, such as JMX. Alerts can be set inside the web management console.

Rackspace Cloud Monitoring. Rackspace Cloud Monitoring is a service provided by Rackspace to monitor
applications on cloud (not limited to Rackspace cloud). Rackspace Cloud Monitoring supports various
monitoring metrics, including CPU, disk, memory, network, processes and custom metrics.

CopperEgg Reveal*. CopperEgg is a cloud computing systems management company and its products provide
monitoring for websites, web applications, servers, and systems deployed in cloud. RevealCloud is a tool for
server monitoring; RevealUpTime is a tool for website and web application monitoring; RevealMetrics is a tool
for custom monitoring metrics. CopperEgg products support almost all the major public cloud providers and
platform. CopperEgg products are not free.
MODAClouds

th
2013 49

New Relic. New Relic is an application performance management company. Its products provide user,
application and server monitoring functionalities. Real user monitoring is able to provide real time performance
metrics such as page load time, page views, Apdex score
1
, identify poor performance pattern and alert and
notification. Application monitoring is able to provide several metrics such as throughput, response time and
Apdex score and transaction tracking and reporting. It also supports alerts and capacity analysis. Server
monitoring provides performance metrics such as CPU, memory, network and IO. New Relic products are not
free.

AppDynamics. AppDynamics is an application performance management company. Its products focus on
providing performance management in cloud. AppDynamics provide real time monitoring and end user
monitoring. AppDynamics is able to achieve troubleshooting by identifying bottlenecks, detecting transaction
anomalies and code diagnostics. AppDynamics provides detailed report and visualised dashboard for statistics
and comparison with each release. AppDynamics products are not free.

3.4. Application-Level Monitoring
This section is devoted to application level Cloud monitoring, which is of significant importance, especially
in the context of Cloud application SLA management. Due to virtualization as the basis for resource sharing,
multiple virtual machines (VMs) can be run on a single physical machine or even multiple applications can be
run on a single VM. As a result, per-application monitoring in such a shared environment is essential to keep
applications healthy and guarantee QoS. Therefore, it is not enough just to monitor a physical machine or a VM
to measure the application resource consumption, detecting SLA violations and managing resources efficiently.

3.4.1. A Run-time Correlation Engine (RTCE-based approach) [Hol10]
Log analysis is a common monitoring task in order to get insight to the system behaviour. Especially in the
context of application-level monitoring it could allow one to know about the application health. However,
monitoring and analysing the behaviour of various components in a Cloud environment is a challenging task due
to the inherent complexity and large scale of Clouds and also the large amount of data to be managed. The
introduced approach, which is built on top of RTCE (Run Time Correlation Engine), provides a scalable log
correlation, analysis, and symptom matching architecture that can perform real time correlation for large
volumes of log data. The huge mass of log files produced by software components demand for logging facilities
in order to analyse them. However, both the facilities and log files are usually application/vendor specific.
Additionally, problems may arise in heterogeneous environments. Therefore, coherent and runtime extraction of
meaningful information from these log files is a challenging issue. The RTCE framework encompasses four
functionalities as follow, also depicted in Figure 3.4.a.: (a) automatic data collection, (b) data normalization into
a common format, (c) runtime correlation and analysis of the data to give a coherent view of system behaviour at
run-time and (d) a symptom matching mechanism identifying errors in the correlated data [Hol09].
The log data from each application read by monitoring agents (MA), which are deployed on the different
hardware components, are routed in the form of events to the Event Correlation Engine (ECE) via TCP/IP
connection. ECE processes these events and present the data on the web server (Tomcat) and users can get
different views (such as a single view of correlated data from multiple sources, log statistics, and reports on
automated real time matching of known symptoms) of the information from the RTCE core by interacting with
the web server. Logs are converted to a single common format.
One limitation of the RTCE framework is scalability in cloud computing environments as they may consist of
hundreds or even thousands of applications and the current RTCE module may not be able to handle logs from
all the applications, which a long-term solution for that is presented in this approach and it is the design of a new
distributed and scalable ECE module. A master ECE would be responsible to correlate all the distributed ECE
instances. The distributed ECE needs to have an extra functionality, which is the ability to send the statistics data
to the master ECE via TCP/IP.

1
The Application Performance Index (Apdex) is a standard method to report performance of software applications, especially to which
extent a given application meets its users' expectations.
MODAClouds

th
2013 50

Figure 3.4.a. The RTCE Current Architecture

3.4.2. CASViD, an SNMP-based monitoring approach for SLA violation [Eme12]:
Application level monitoring is a challenging task as the infrastructure and platform layer metrics need to be
mapped to the required metrics at the application layer for the purpose of SLA management. CASViD (cloud
application SLA violation detection) aims at monitoring and detecting SLA violations at the application layer,
and includes tools for resource allocation, scheduling, and deployment. The CASViD architecture is shown in
Figure 3.4.b. and works as follows. Service requests are placed through a defined interface to the front-end node
(step 1), acting as the management node. The VM configurator sets up the Cloud environment by deploying
preconfigured VM images (step 2). The request is received by the service interface and delivered to the SLA
management framework for validation (step 3). Then the request is passed to the application deployer (step 4) for
resource allocation and deployment (step 5). CASViD monitors the application and sends information to the
SLA management framework (step 6) for detection of SLA violations.

Figure 3.4.b. The CASViD Architecture

The manager periodically polls each agent to get the monitored information. It is composed of a library and
an agent. The agent implements the methods to capture metrics defined in the CASViD monitor MIB
(Management Information Base). At the manager side, the monitor library provides methods to configure, which
metrics should be captured and which nodes should be included in the monitoring. This library is then used by
the SLA management framework to configure the monitoring process and retrieve the desired metrics. The
retrieval process can be performed by collecting the metrics information from application or operating system
log files. The strategy of detecting SLA violations is based on the predefined threat thresholds, which are more
resource allocation to services, service scheduling, application
monitoring, and SLA violation detection (Figure 1).
Computng Envronment Nodes
Servce Interface
Physca Resources Physca Resources
V
M

C
o
n
f
g
u
r
a
t
o
r
s
1. Servce Request
+
Response
2. Setup VMs
3. Vadate Request
5. Depoy
Customer
Montor Framework
SLA Management Framework
Appcaton Depoyer
Vrtuazaton Layer
Vrtuazaton Layer
Front-End Node
4. Servce Request 6. Montor
Fig. 1. CASViD Architecture.
Customers place their service requests through a dened
interface to the front-end node (step 1, Figure 1), which
acts as the management node in the Cloud environment. The
VM congurator sets up the Cloud environment by deploying
precongured VM images (step 2) on physical machines and
making them accessible for service provisioning. The request
is received by the service interface and delivered to the SLA
management framework for validation (step 3), which is done
to ensure that the request comes from the right customer. In
the next step the service request is passed to the application
deployer (step 4), which allocates resources for the service
execution and deploys it in the Cloud environment (step 5).
After deploying the service application, CASViD monitors the
application execution and sends the monitored information to
the SLA management framework (step 6) for processing and
detection of SLA violations.
The VM congurator and application deployer are com-
ponents for allocating resources and deploying applications
on our Cloud testbed. They are included in the architecture
to show our complete solution. The Application Deployer is
responsible for managing the execution of user applications,
similar to brokers in the Grid literature [1], [16], [26], [30],
focusing on parameter sweeping executions [11]. It simplies
the processes of transferring application input data to each
VM, starting the execution, and collecting the results from the
VMs to the front-end node. The mapping of application tasks
to VMs is performed dynamically by a scheduler located in
the Application Deployereach slave process consumes tasks
whenever the VM is idle. Further details on this component
and VM congurator are found in our previous work [17],
[18]. The execution of the applications and the monitoring
process can be done automatically by the Cloud provider,
or can be incorporated into a Cloud Service that can be
instantiated by the users.
The proposed CASViD architecture is generic in its usage
as it is not designed for a particular set of applications. The
service interface supports the provisioning of transactional as
well as computational applications. The SLA management
framework can handle the provisioning of all application
types based on the pre-negotiated SLAs. Description of the
negotiation process and components is out of scope of this
paper and is discussed by Brandic et al. [8].
A. System and Application Monitor
CASViD architecture contains a exible monitoring frame-
work based on the SNMP (Simple Network Management
Protocol) standard [12]. It receives instructions to monitor
applications from the SLA management framework and de-
livers the monitored information. It is based on the traditional
manager/agent model used in network management. Figure
2 presents the monitor architecture. The manager, located in
the management node, polls periodically each agent in the
cluster to get the monitored information. In order to enhance
its scalability, the monitor uses asynchronous communication
with all cluster agents. It is composed of a library and an agent.
The monitor agent implements the methods to capture each
metric dened in the CASViD monitor MIB (Management
Information Base). At the manager side, the monitor library
provides methods to congure which metrics should be cap-
tured and which nodes should be included in the monitoring.
The SLA management framework in the system architecture
uses this library to congure the monitoring process and
retrieve the desired metrics. The retrievement process can be
done by collecting the metrics information from application
or operating system log les.
Management System
Montor Lbrary
Management Node
SNMP Protoco
SNMP Agent
Montor Agent
Montor
MIB
/proc
Appcaton
Foder
Appcaton
Foder
Processing Node

.
.
.
Fig. 2. CASViD Monitor Overview.
Similar to other monitoring systems [21], [29], CASViD
monitor is general purpose and supports the acquisition of
common application metrics, and even system metrics such as
CPU and memory utilization. The application metrics (SLA
parameters) to be monitored depends on the application type
and how to ensure its performance.
B. SLA Management Framework
The service provisioning management and detection of ap-
plication SLA objective violations are performed by the SLA
management framework component. This component is central
and interacts with the Service Interface, Application Deployer,
and CASViD monitor. In order to manage the SLA violations,
!"# !"# !"" !"#
resource allocation to services, service scheduling, application
monitoring, and SLA violation detection (Figure 1).
Computng Envronment Nodes
Servce Interface
Physca Resources
Physca Resources
V
M

C
o
n
f
g
u
r
a
t
o
r
s
1. Servce Request
+
Response
2. Setup VMs
3. Vadate Request
5. Depoy
Customer
Montor Framework
SLA Management Framework
Appcaton Depoyer
Vrtuazaton Layer
Vrtuazaton Layer
Front-End Node
4. Servce Request 6. Montor
Fig. 1. CASViD Architecture.
Customers place their service requests through a dened
interface to the front-end node (step 1, Figure 1), which
acts as the management node in the Cloud environment. The
VM congurator sets up the Cloud environment by deploying
precongured VM images (step 2) on physical machines and
making them accessible for service provisioning. The request
is received by the service interface and delivered to the SLA
management framework for validation (step 3), which is done
to ensure that the request comes from the right customer. In
the next step the service request is passed to the application
deployer (step 4), which allocates resources for the service
execution and deploys it in the Cloud environment (step 5).
After deploying the service application, CASViD monitors the
application execution and sends the monitored information to
the SLA management framework (step 6) for processing and
detection of SLA violations.
The VM congurator and application deployer are com-
ponents for allocating resources and deploying applications
on our Cloud testbed. They are included in the architecture
to show our complete solution. The Application Deployer is
responsible for managing the execution of user applications,
similar to brokers in the Grid literature [1], [16], [26], [30],
focusing on parameter sweeping executions [11]. It simplies
the processes of transferring application input data to each
VM, starting the execution, and collecting the results from the
VMs to the front-end node. The mapping of application tasks
to VMs is performed dynamically by a scheduler located in
the Application Deployereach slave process consumes tasks
whenever the VM is idle. Further details on this component
and VM congurator are found in our previous work [17],
[18]. The execution of the applications and the monitoring
process can be done automatically by the Cloud provider,
or can be incorporated into a Cloud Service that can be
instantiated by the users.
The proposed CASViD architecture is generic in its usage
as it is not designed for a particular set of applications. The
service interface supports the provisioning of transactional as
well as computational applications. The SLA management
framework can handle the provisioning of all application
types based on the pre-negotiated SLAs. Description of the
negotiation process and components is out of scope of this
paper and is discussed by Brandic et al. [8].
A. System and Application Monitor
CASViD architecture contains a exible monitoring frame-
work based on the SNMP (Simple Network Management
Protocol) standard [12]. It receives instructions to monitor
applications from the SLA management framework and de-
livers the monitored information. It is based on the traditional
manager/agent model used in network management. Figure
2 presents the monitor architecture. The manager, located in
the management node, polls periodically each agent in the
cluster to get the monitored information. In order to enhance
its scalability, the monitor uses asynchronous communication
with all cluster agents. It is composed of a library and an agent.
The monitor agent implements the methods to capture each
metric dened in the CASViD monitor MIB (Management
Information Base). At the manager side, the monitor library
provides methods to congure which metrics should be cap-
tured and which nodes should be included in the monitoring.
The SLA management framework in the system architecture
uses this library to congure the monitoring process and
retrieve the desired metrics. The retrievement process can be
done by collecting the metrics information from application
or operating system log les.
Management System
Montor Lbrary
Management Node
SNMP Protoco
SNMP Agent
Montor Agent
Montor
MIB
/proc
Appcaton
Foder
Appcaton
Foder
Processing Node

.
.
.
Fig. 2. CASViD Monitor Overview.
Similar to other monitoring systems [21], [29], CASViD
monitor is general purpose and supports the acquisition of
common application metrics, and even system metrics such as
CPU and memory utilization. The application metrics (SLA
parameters) to be monitored depends on the application type
and how to ensure its performance.
B. SLA Management Framework
The service provisioning management and detection of ap-
plication SLA objective violations are performed by the SLA
management framework component. This component is central
and interacts with the Service Interface, Application Deployer,
and CASViD monitor. In order to manage the SLA violations,
!"# !"# !"" !"#
MODAClouds

th
2013 51

restrictive than the violation thresholds. With this information the system can react quickly to avert the violation
threat and save the Cloud provider from costly SLA violation penalties.
3.4.3. A multi-layer approach for cloud application monitoring [Gon11]
Hierarchical monitoring and analysis is a methodology for refining the monitoring data and analysis results in
order to achieve higher precision and also reduce the amount of data to be analysed. In the context of Cloud
computing it could be taken into account for the purpose of load lightening (i.e., the amount of data to be
analysed) and reasoning on monitoring data. A three-dimensional approach for cloud application monitoring is
proposed in this work encompassing the Local Application Surveillance (LAS), the Intra Platform Surveillance
(IPS) and the Global Application Surveillance (GAS) dimensions with the interconnection and subcomponents
shown in Figure 3.4.c. LAS monitors the application instance to check for rules violations. For the purpose of
further analysis, the output of the LAS is sent to the assigned IPS, which is an extra monitoring mechanism at the
level of one particular VE and analyses data from different VMs running on the same machine looking for issues
arising as a result of interaction between VMs and between the applications running on the same VM. The
filtered results are then sent to the GAS components for further analysis.

Figure 3.4.c. LAS, IPS and GAS layers.

The aim of the optional GAS component, which is assigned one per application (not instance), is monitoring
the software pieces and detecting modelling and implementation problems through analysing data from different
machines (from several IPS components) referred to the same application. The global view of the GAS
components reveals the behaviour of the software in different virtualized environments inferring proper
conclusions for both the applications users and developers.
3.4.4. Cloud Application Monitoring: the mOSAIC Approach [Rak11]
The building of custom monitoring systems for Cloud applications is facilitated using the mOSAIC API. The
mOSAIC approach as a whole contains four modules as the API, the framework (i.e., platform), the provisioning
system, and the semantic engine. The API and the framework aim at the development of portable and provider
independent applications. The provisioning system works at IaaS level and resource management. The
functionality of the provisioning system is a part of the Cloud agency [Ven11]. The framework is a collection of
predefined Cloud components in order to build complex applications. The framework constitutes a PaaS
enabling the execution of complex services with predefined interfaces. The mOSAIC SLA management
MODAClouds

th
2013 52
components are also part of the Framework. The API offers the implementation of a programming model in a
given language (currently Java, and Python in the future) to build applications. The API provides new concepts
(e.g., the Cloudlet or the Connector) in order to focus on Cloud resources and communications instead of the
resource access or communication details. The mOSAIC architecture is depicted in Figure 3.4.d.

Figure 3.4.d. The mOSAIC Architecture

Resource Monitoring is implemented by the Cloud agency. Archiver, which is a monitoring agent offered by the
agency, collects monitoring information from the agents distributed on the monitored resources and stores the
messages in a storage system. The Monitoring agent has the ability to collect information from common
monitoring systems (e.g., Ganglia, Nagios, SNMP-based applications, etc.) and publish them on the same
storage. Applications interact with the Archiver through a connector. The Observer component generates events
on the (resource) event bus by accessing the storage filled by the Archiver. Integrated in the Cloudlet,
Application Monitoring connector is responsible for application component monitoring and generating events on
the connected buses. Then after application components need to share the monitoring information and manage
the related events.
The mOSAIC monitoring API offers a set of connectors representing an abstraction of resource monitoring and a
set of drivers implementing different ways of acquiring monitoring data (from different techniques); therefore, it
supports monitoring by (i) offering a way to collect data directly from any of the components of a mOSAIC
application, (ii) offering a way to collect data for any proposed monitoring techniques (accessing Cloud-
provider, resource-related, and mosaic monitoring tools (called M/W (monitoring/warning) system)) and (iii)
supporting the mOSAIC Cloud application in order to access data regardless to the technology of the acquired
resources and the way they are monitored. The aim of the set of mOSAIC monitoring tools, offered by the
mOSAIC framework, is offering the ability to building up a dedicated monitoring system.
3.4.5. M4Cloud, A Generic Application Level Monitoring [Mas11]
This model-driven approach classifies and monitors application-level metrics in shared environments such as
the Cloud. The basis for the implementation of the monitoring phase is the Cloud Metric Classification (CMC).
CMC identifies the following four models: application based (e.g., generic/specific), measurement based (e.g.,
direct/calculable), implementation based (e.g., shared/individual) and nature based (e.g., quantity/quality)
models. The application based model supports the distinction of the metrics on the basis of the application they
belong to. The measurement based model is applied to define the formulas from which metrics can be calculated;
the Implementation Based Model defines for each metric the corresponding measurement mechanisms,
Figure 1: mOSAIC Monitoring Components Architecture
by the user, publish resource-related information, such as
the CPU usage, at given time intervals. The mOSAIC de-
veloper has the role of developing the monitoring Cloudlet,
which, through the connectors to the Event bus is able to
retrieve all the monitored events. Moreover it is able to ac-
cess to the Cloud Agency in order to retrieve more general
purpose information.
6 Related Work
As discussed in previous sections, in monitoring of
Cloud-based applications, we can distinguish two sepa-
rate levels: the infrastructure level and the application-level
monitoring. Infrastructure-level resource monitoring [7]
aims at the measurement and reporting of system parame-
ters related to real- or virtual infrastructure services offered
to the user (e.g. CPU, RAM, or data storage parameters). At
this level, some of the better advertised Cloud-monitoring
solutions include the Nimsoft Monitor [4], Monitis [3], or
the Tap in Cloud Management Service [5]. These services
cover different subsets of Cloud-service providers, and they
support different measurements based on the actual Cloud
provider that is being monitored. On the application level,
the nature of the monitored parameters, and the way their
values should be retrieved depend on the actual software
being monitored, and not on the Cloud infrastructure it is
running on. VMware vFabric Hyperic [6], e.g., specializes
in web-application monitoring, and is capable of monitor-
ing applications that utilize any of the app. 75 supported
web technologies. A more general approach is to utilize the
JMX Java framework [1], that is employed by most Java
application containers, and is capable of providing infor-
mation on the status of the application running in the con-
tainer. This, however, requires that the application is writ-
ten in Java, and that it is prepared to publish information
through the JMX subsytem. Regardless of the level of pa-
rameters being monitored, a general monitoring infrastruc-
ture is required to collect and process the information pro-
vided by the monitored components. Such an infrastructure
is provided by the Lattice framework [2] (also utilized in
the RESERVOIR project [13]) that has a minimal run-time
footprint and is not intrusive, so as not to adversely affect
the performance of the system itself or any running applica-
tions. The framework denes a system of data sources, data
consumers, and control strategies that inuence the collec-
tion of monitoring data. The monitoring data can be trans-
ported over IP multicast solutions, Event Service Bus, or a
publish/subscribe mechanism. This is a very exible frame-
work, which is tailored towards distributed applications, but
not Cloud applications. In contrast, in the mOSAIC mon-
itoring subsystem, we can utilize such Cloud-oriented ser-
vices as reliable messaging and exible storage options for
the measurement data.
!"#
MODAClouds

th
2013 53

coherently with the formulas defined at the previous step; finally, the nature based model defines the nature of
the metrics and their definition within SLAs. More info on the models could be found in the original article in
[Mas11].
CMC is part of the M4Cloud framework. This is shown in Figure 3.4.e. In this framework, the FoSII
infrastructure [Bra10] is used as a Cloud Management System (CMS). Monitored data is analysed and stored
within a knowledge database and then is used for planning actions. Moreover, monitored data is also acquired
and analysed after the execution of such actions, for the purpose of efficiency evaluation.

Figure 3.4.e. Architecture of M4Cloud.

3.4.6. REMO, a Resource-Aware Application State Monitoring approach [Men08]
Cost effectiveness and scalability are among main criteria in developing monitoring infrastructure for large-
scale distributed applications. REMO addresses the challenge of constructing monitoring overlays from the cost
and scalability point of views jointly considering inter-task cost-sharing opportunities and node-level resource
constraints. Processing overhead is modelled in this approach in a per message basis. A forest of optimized
monitoring trees is deployed in the approach through iterations of two phases exploring cost sharing
opportunities between tasks and refining the tree with resource sensitive construction schemes. In each iteration
a partition augmentation procedure is run generating a list of most promising augmentations for improving the
current distribution of workload among trees, considering also the cost estimation for the purpose of limiting the
list. Then, these augmentations are further refined through a resource-aware evaluation procedure and
monitoring trees are built accordingly (through the resource-aware tree construction algorithm).
An adaptive algorithm is also considered for the purpose of balancing the cost and benefits of the overlay,
which is useful especially for large-scale systems with dynamic monitoring tasks.
Planning the monitoring topology and collection frequency are important factors in keeping a balance
between monitoring scalability and cost effectiveness. The drawback of proposed approaches up to date is that
they either build monitoring topologies for each individual monitoring task (e.g., TAG [Mad02], SDIMS
[Yal04], PIER [Hue05], join aggregations [Cor05], REED [Aba05], operator placement [Sri05]) or use a static
one for all monitoring tasks [Sri05], which none of them is optimal. For instance, it could happen that two
monitoring tasks collect data over the same nodes. Hence, in such a case it is more efficient to consider just one
monitoring tree for data transmission, as nodes can merge updates for both tasks and reduce per-message
processing overhead. Therefore, it is of significant importance to consider multi-monitoring-task level topology
optimization for the purpose of monitoring scalability. Load management is another important factor to be
considered in monitoring data collection, especially for data-intensive environments, meaning that the
MODAClouds

th
2013 54
monitoring topology should be able to somehow control the amount of resources spent in order to collect and
deliver the data. In the case of ignoring this fact, it may lead to overloading and consequently losing of data.
Remo approach addresses all these issues considering node-level resources in building a monitoring topology
and optimizing the topology for the purpose of scalability and ensuring that no node is assigned with monitoring
workloads more than the amount that their available resources could support.
Three main advantages of this approach are as follows. At first, it identifies three critical requirements of
large-scale application monitoring including sharing message processing cost among attributes, meeting node-
level resource constraints and efficient adaptation based on monitoring task changes. Then after, a monitoring
framework optimizing the monitoring topologies and addressing the above-mentioned requirements is proposed.
Finally, techniques for runtime efficiency and support are developed as well. The figure below demonstrates the
high level model of REMO encompassing the following four components as the task manager, the management
core, the data collector and the result processor. The functionalities of each of these components are summarized
in the architecture section of Figure 3.4.f.

Figure 3.4.f. The REMO Architecture

3.4.7. Cloud4SOA
2

The Cloud4SOA monitoring offers a unified platform-independent mechanism, to monitor the health and
performance of business-critical applications hosted on multiple Clouds environments in order to ensure that
their performance consistently meets expectations defined by the SLA. In order to consider the heterogeneity of
different PaaS offering Cloud4SOA provides a monitoring functionality based on unified platform independent
metrics.
The Cloud4SOA monitoring functionality allows to monitor leverages on a range of standardized and unified
metrics of different nature (resource / Infrastructure level, container level, application level, etc.) that, based on
the disparate underlying cloud providers, allow the runtime monitoring of distributed applications so as to
enforce the end-to-end QoS, regardless of where they are deployed across different PaaS. In the scope of
Cloud4SOA several metrics have been defined (Table 3.4.a) from the cloud resource as well as the business
application perspective, but not all of them have been enforced at runtime since they only provide useful
information about the status of the application.

2
http://www.cloud4soa.eu
MODAClouds

th
2013 55

Table 3.4.a Cloud4SOA Metrics
Metric Description App. Cloud
CPU load The amount of computational work that the
application performs
X
Memory Load The amount of memory consumed by the application X
HTTP Response
Code
Includes custom status messages to understand the
health of the application, but also the performance of
the cloud
X X
Application and
DB Response Time
Time that measures the efficiency and speed with
which servers deliver requested web content to end
users
X X
Application
Container
Response Time
The elapsed time between the end of an inquiry or
demand on a cloud system and the beginning of a
response
X
Cloud Response
Time
The time the Cloud needs to process and forward to
the application the incoming call
X

3.5. Cost Monitoring and Measurement Indexes
A common measure for measuring cost of IT is the Return on Investment or ROI calculation. Analysts such as
Daryl Plummer from Gartner have reviewed the use of ROI calculations in the monitoring and measurement of
cloud services. His discussions focused on industrial monitoring and measurement of cloud services. Other
analysts such as Trevor Pott look at the complexity of calculating metrics such as ROI in a cloud environment
with different infrastructures and legacy operations all in the mix.

The single issue that strikes any reviewer of the state of measurement and monitoring of cloud services is that
there is no agreed way of measuring or comparing cloud services. This lack of agreement or commonality has
led to initiatives from commercial organizations and standards bodies. World Wide Web Consortium (W3C)
initiated an incubator project on the Unified Service Description Language (USDL) as a way of generating
consensus. The lack of a public standard for monitoring cloud services is not a problem for private cloud
implementations. In-house staff or service providers in a private cloud environment can facilitate interoperable
monitoring without concerning themselves about external service measurement. Private clouds are the
predominant environments in corporations at the present time, however there is a growth in rogue IT and the use
of external services. These both demand a more standardized means of monitoring and measuring cloud services.
There are a few initiatives active in the description of cloud services in index form. Their goal is to create a
standard way of comparing services during the selection process. W3Cs USDL incubator is one such initiative
The Cloud Service Measurement Index Consortium (CSMIC) is another initiative, by a number of organizations
to describe measures for use in the comparison of service behaviour at service selection time.
3.5.1. Unified Service Description Language
USDL was the name given to an incubator project from W3C. USDL extends the state of the art in many fields
of service description and is seen as an extension of work done in the semantic web in general and linked data in
particular. USDL is seen as a language based method of aligning business services by using a common
description. The incubation group has completed its work and has delivered a report that contains their
recommendations.
It is clear from the report that USDL requires additional work to make it valid for use in cloud services.
Particularly there are requirements to create module specific processes as well as descriptions. A good example
of this is the legal module. In the legal module there will need to be different processes for each jurisdiction.
Another identified extension to USDL is a language specific query language. Little work appears to have been
completed since the report was published.
3.5.2. Service Measurement Index
The Service Measurement Index (SMI) is a standardization initiative managed by the Cloud Services
Measurement Index Consortium, led by Carnegie Mellon University. The consortium has a current membership
of 17 organisations from all areas of IT including, universities, benchmarking specialists, software houses and
MODAClouds

th
2013 56
systems integrators. All have an interest in defining a common method for describing services. The consortium
meets regularly to develop definitions of service attributes and measures. These are not measures for continuous
monitoring of cloud service performance, but for service selection. Some of these measures can be modified to
describe continuous monitoring measures.

SMI is defined as a set of measures that describe an attribute that is part of a service category.
Table 3.5.a contains a list of attributes that have been prioritized for definition of measures. This is the current
list and will be expanded later in the exercise. The financial category is of particular interest to MODAClouds as
a source of information and potential cost management information.

Category Selected Attributes
Accountability Compliance, Ease of Doing Business, Provider Certifications, Provider contract/SLA
verification
Agility Elasticity, Portability, Scalability
Assurance Availability, Reliability, Resiliency/fault tolerance
Financial Acquisition, On-going cost, Transition costs
Performance Functionality, Interoperability, Service response time
Security and Privacy Access control & privilege management, Data integrity, Data privacy and data loss
Usability Accessibility, Learnability, Suitability
Table 3.5.a. List of CSMIC Prioritised Attributes by category

Each attribute has one or more measures defined and for each measure there is a template for the description that
contains a number of fields. There is the normal identification and meta-data and the measure description. The
measure description contains information about how the measure is described; the frequency and units of
measure plus the formula for calculating this measure. Some formulae are simple yes/no questions, for example
Is the service supplier Sarbanes-Oxley certified? Other formulae are more complex and are used to describe a
measure. Measures are weighted based on their importance to the selector of services.

Data is gathered to allow calculations of a measure. The data is either gathered by benchmarking or by the
contribution of service suppliers. In a prototype exercise in 2011 several cloud service suppliers donated
performance, security and quality metrics. A leading university benchmarked several public services to gather.
This formed the basis of examples of service selection that validated the approach. From those early days the
basic QARCCS (Quality, Agility, Risks, Cost, Capability, Security) model was modified to the current list of
categories.

The SMI is now in the final stages of internal review of attribute measure definition and the next stage is for
those measures to be passed to the wider community for external review. The review should be completed in the
second quarter of 2013. During this review period the measures will undergo further refinement. A user scenario
and a demonstration tool for calculating service selection heat-maps are also under development. Defined
measures and data will allow service selection and the further development of a standard measure of cloud
services from SMI. The use of SMI to select services is illustrated in Figure 3.5.a.
MODAClouds

th
2013 57

Figure 3.5.a. Service selection heat-map for dummy service.

If the user placed a high priority on the security and privacy category then despite a high score for usability this
service would not be acceptable. Even if a service does not provide an adequate score in a category this does not
mean the service is always unacceptable. If may become acceptable with suitable risk mitigation.

MODAClouds

th
2013 58

Table 3.5.b. Monitoring component in EU Projects
Tools Features
4CaaST Reused Frameworks from the state of art:

Collected Framework(JMX, monitor applications that have MBeans)
Publish/Subscribe Middleware(framework used: SilboPS)
JASMINe(collector provided: MbeanCmd)
To access monitoring services inside 4CaaSt, a TCloud REST-based API is proposed.
Design a new environment which can use all the different available generic monitoring
systems(e.g. Ganglia, collectd, mBeanCmd, etc),
Cloud4SOA Front-End(visual information about the status (life cycle) of the deployed applications)
Back-End(Collect data from platforms)
Cloud-TM The skeleton of our prototype implementation of WPM(Workload and Performance Monitor) is
based on the Lattice framework;
The prototypal implementation of the data platform oriented probes has been extensively based
on the JMX framework
Contrail Monitoring is based on the monitoring solution developed in the SLA@SOI project
(architecture);

TheWeb hosting service: use Ganglia for monitoring several application-specific parameters
OPTIMIS
Java, RESTful
Web Services
Interfaces to downstream components querying data from the
Monitoring Infrastructure (e.g. TREC components and Monitoring
website)
Interfaces to upstream components inserting data into the Monitoring
Infrastructure (e.g. Probe and scripts).

MySQL Storage of monitoring data
Google Web
Toolkit
Monitoring website

Vision
Cloud
The Monitoring system is responsible for collecting cluster level usage records and aggregate
them to generate cloud level usage records. The generated records are pushed to the Accounting
system via a RESTful web interface.
mOSAIC Regarding monitoring of cloud-based applications, we distinguish two separate levels: the
infrastructure level and the application-level monitoring.

Connector API for Monitoring: Use JSON(JavaScript Object Notation) for data exchange

MODAClouds

th
2013 59

4. State-of-the-Art: QoS Management
4.1. Preamble
Quality of Service (QoS) plays a central role in the optimal delivery of web services, and as more applications
are deployed on public clouds, the task of handling QoS becomes harder. As more applications share the same
infrastructure, their demand for resources may create contention that reduces the QoS perceived by the user. In
this section we review methods that have been proposed during the last decade to provide tools to system
administrators to manage QoS of web applications.

To provide a better view on this problem we have divided this section in three parts:

Data analysis and forecasting. The first section deals with need to accurately describe the system
workload, as this drives the demand for resources. The high variability and auto-correlation present in
web application workloads calls for advanced modelling approaches for prediction and evaluation.
These include statistical regressions, autoregressive models, and machine learning techniques.
Runtime QoS models. This section presents recent advances on QoS runtime models, which are tools to
evaluate the performance of the application under a given workload mix, resource availability, and
resource management policy. These models attempt to predict the effect of a reconfiguration on the
system performance, allowing an to predict the benefits of a reconfiguration before applying it, and to
consider future changes needed in order to cope with a potential change in the workload. The models
used in this section are based on statistical inference, control theory, and queueing theory.
SLA management. This section describes methods to determine resource management policies to cope
with Service Level Agreements (SLA). SLAs exist between application provider and end users and also
between application provider and cloud provider. We here focus on the first kind, i.e., agreements with
the end users. We consider policies for application placement, admission control and capacity
allocation. To determine the most appropriate policy, optimization and game theoretic methods are
reviewed.
4.2. QoS Data Analysis and Forecasting
4.2.1. Problem
MODAClouds will offer a data analysis platform that will serve among its main purposes the goal of
parameterizing models of cloud applications in order to deliver predictions of their QoS metrics. Classical QoS
data analysis involves service demand estimation and traffic forecasting. Service demand estimation
means that approximate the service demand for different classes of requests by analysing the log file or
streaming data. Typical interesting data metric includes response time and server utilisation. Traffic
forecasting problem is to forecast the incoming workload by analysing the historical data and obtain the
future trend. This will bring the need for methods to keep predictive models consistent with observations to
maximize predictive accuracy. A similar concept is adopted for example in [Shi06], [Coh04], [Des12] where the
runtime engine features statistical learning methods, classification, regression, adaptive re-learning. Machine
learning methods can be more flexible than stochastic models in capturing dependencies in empirical data.
However, they can be less accurate in what-if analysis since they take a black-box system view. For example,
since they do not model scheduling mechanisms, it is difficult to predict the effects of changes in request
priorities, which can, instead, be simple to predict with a stochastic model. Unsupervised methods may also be
inapplicable for predicting metrics that are unobservable due to overhead concerns (e.g., threading levels).

While definition of machine learning methods is usually embedded with the modelling technique itself (e.g.,
training algorithms for neural networks), less standardized data analysis methods are required to parameterize
QoS stochastic models. Since WP5 aims at leveraging such models for design-time predictions, the runtime
environment will take advantage of the WP5 models and therefore will need mechanisms to parameterize them.
QoS model parameterization can be broadly divided into direct measurement techniques and statistical inference
methods. Direct measurement parameterization, as used for instance in [Urg05], is usually expensive in terms of
overhead because it instruments the code or tracks the requests to see what they do However, an estimation
problem can be formulated using statistical inference methods and this can be reapplied periodically. In the next
sections we focus on statistical inference methods for data analysis and forecasting of QoS model parameters. As
MODAClouds

th
2013 60
the methods share similarities in the techniques used, we provide a brief review of these techniques in the
following section.
4.2.2. Overview of data analysis, forecasting techniques, and queueing models
Before describing recent works on this area, we overview some of the main techniques used for data analysis and
forecasting.

Regression analysis
Regression is a statistical technique to estimate the relationship among observed variables. Regression
models can be formulated as: Y = f(X, !), where X is independent variable and Y is dependent variable.
Both X and Y are observed variables and ! is the variable to estimate. The most simple regression
method is linear regression which assumes a linear relationship Y = X! + ! between the variables.
Regression approaches are widely used for prediction and forecasting. Classical approaches include
ordinary least squares linear regression, which is trying to find the simple linear relation that minimises
the sum of squared residuals, and non-linear methods such as SVM regression, which uses Support
Vector Machine (SVM) to obtain the non-linear relation between variables.

Autoregressive models
Autoregressive models are mathematical models describing time-varying processes. Classical methods
include autoregressive moving average models (ARMA) and their generalization form of autoregressive
integrated moving average model (ARIMA). The ARMA models form a class of linear time series
models. By adjusting the order of the model, any linear time series models can be approximated with a
desired accuracy. Autoregressive models can also be used to forecast time series which has been
mainly used in economics and natural science.

Kalman filter
Kalman filter is a technique to estimate the states of a running system by analysing the system input and
noisy and incomplete observed data. Kalman filter works in two steps. First, the algorithm estimates the
current system state and the uncertainties. Once obtaining the observed measurement of the system,
Kalman filter tries to update the previous estimates using a weighted average, with higher weight to the
estimates with lower uncertainty. Kalman filter is a recursive estimator and it only requires the current
measurement and previous state, therefore it is suitable for online parameter estimation to achieve
adaptive management of the system.

Machine Learning methods
Machine learning algorithms have been studied extensively over the last century and their application in
queueing system raise much interest in this decade. Machine learning has the advantage that it requires
no internal structure of the system and treat it as a black-box. Therefore they are more flexible than
stochastic models in capturing dependencies among the data. Techniques like Support Vector Machine
(SVM), Artificial Neural Network (ANN), Clustering and Bayesian models has been studied and
applied to recognise patterns of workload and predict and forecast future events.

Along this section we will repetitively make reference to queueing systems and queueing networks, which
among the most important modelling tools for QoS management. To make this document self-contained, we
provide a brief overview of these techniques, pointing to deliverable D5.1 for additional details:

Queueing systems
A queueing system is a mathematical model that consists of one or several servers that deliver a time-
consuming service to a population of clients/requests. A queueing system can be described using the
Kendall notation: A/B/c/k, where A describes the request arrival process, B describes the service
process, c is the number of servers in the system, and k is the number of spaces in the system, including
service and waiting spaces. For instance, the most traditional queue is the M/M/1/! queue, where the
inter-arrival request times and the service times follow an exponential distribution, there is one server
and infinite room for holding waiting requests. Other usual values for the A and B components include
a general distribution G and the Erlang distribution Er, among others. Another relevant aspect of a
queueing system is its service discipline, which determines how the server/resource is allocated among
the incoming requests. Common service disciplines include First-Come-First-Serve (FCFS), Last-
MODAClouds

th
2013 61

Come-First-Serve (LCFS), Processor Sharing (PS) and Generalized Processor Sharing (GPS), among
others.
Queueing networks
A queueing network is a collection of queueing systems (each one a node in the network) that interact
through their arrival and departure processes. When a request finishes being served at a node, it may
move to another node in the network, or leave the network, according to a probabilistic routing matrix.
Further, the requests can be classified in different classes, depending on the probability laws that govern
their external arrivals, services and routing. Queueing networks are ideal to analyze systems where
several resources are accessed by external requests.
Layered queueing networks
Layered queueing networks are an extension to queueing networks that allows the representation of
computer systems composed of several layers of software servers that share hardware resources. The
layers play an important role in software applications as they capture the blocking and waiting that a
software server (in a layer) experiences when it requests a process from another server (in a lower layer)
in order to complete its service. When performing such a request, the calling server gets blocked and is
unable to provide any service, a feature that product-form queueing network models are not able to
capture.

4.2.3. Statistical Inference for QoS model parameterization
Statistical inference techniques differ from direct data measurement techniques because they aim at calibrating
QoS model parameters from aggregate statistics such as CPU utilization or response time measurements.

In [Men94], a standard model calibration technique is introduced. The technique is based on comparing the
performance metrics (e.g., response time, throughput and resource utilization) predicted by a performance model
against measurements collected in a controlled experimental environment varying the system workload and
configuration. Given the lack of control over the system workload and configuration during operation,
techniques of this type may not be applicable for online model calibration.

In [Rol95], linear regression is used for parameter estimation and is found to be accurate with less than 10%
error with respect to simulation data. However, regression fails when there is not enough variability in the
observed data. [Rol98] studies the precision of linear regression using simulation of different service time
distributions, which is shown to decrease as the service variance grows. In [Liu05], performance models are
calibrated by application-independent synthetic benchmarks. The approach uses middleware benchmarking to
extract performance profiles of the underlying component-based middleware. However, application-specific
behaviour is not modelled.

The study in [Zha07] presents a regression-based approximation of the CPU demand of customer transactions,
which is later used to parameterize a queueing network model where each queue represents a tier of the web
application. It is shown that such an approximation is effective for modeling different types of workloads whose
transaction mix changes over time. Moreover, [Cas08a] presents an optimization-based inference technique that
is formulated as a robust linear regression problem that can be used with both closed and open queueing network
performance models. It uses aggregate measurements (i.e., system throughput and utilization of the servers),
commonly retrieved from log files, in order to estimate service times. The work in [Pac08] considers the problem
of dynamically estimating CPU demands of diverse types of requests using CPU utilization and throughput
measurements. The problem is formulated as a multivariate linear regression problem and accounts for multiple
effects such as data aging.

In [Kal11], an on-line resource demand estimation approach is presented. An evaluation of regression techniques
Least Squares (LSQ), Least Absolute Deviations (LAD) and Support Vector Regression (SVR) is presented.
Experiments with different workloads show the importance of tuning the parameters, thus the authors proposes
an online method to tune the regression parameters.

In [Kal12], a novel approach of resource demand estimation is proposed for multi-tier systems. The Demand
Estimation with Confidence (DEC) approach it proposes can effectively overcome the problem of
multicollinearity in regression methods. DEC can be iteratively applied to improve the accuracy. A thorough
evaluation demonstrates the effectiveness of the algorithm.
MODAClouds

th
2013 62

Other approaches to model calibration are presented in [Wu08] and [Zhe08]. Both of them use Extended Kalman
Filter for parameter tracking. While in [Wu08], a calibration framework based on fixed test configurations is
proposed, [Zhe08] applies tracking filters on time-varying systems. [Zhe08] extends [Zhe05], where the use of
an extended Kalman filter is investigated to adjust the estimated parameters based on utilization and response
time measurements. The above approaches to model calibration, however, have not been validated in scenarios
of a realistic size and complexity yet and it is currently not clear if they can be used as a basis for online model
calibration.

The study in [Cre10] proposes a method based on clustering to estimate the service time. The authors employ
density based clustering to obtain clusters and then use clusterwise regression algorithm to estimate the service
time. A refinement process is conducted between clustering and regression to get accurate clustering results by
removing outliers and merging the clusters that fit the same model. This approach proves to be computationally
efficient and robust to outliers.

[Cre12] proposes an algorithm to estimate the service demands for different system configurations. A time-based
linear clustering algorithm is used to identify different linear clusters for each service demands. This approach
proves to be robust to noisy data. Extensive validation on generated dataset and real data show the effectiveness
of the algorithm.

[Sha08] explores the problem of inferring workload classes automatically from high-level measurement of
resources (e.g., request rate, total CPU and network usage) using a machine learning technique known as
independent component analysis (ICA).

In [Sut08], the authors propose using an inference method to estimate the parameters in a queueing network.
This method can effectively overcome the problem of queueing models which require distributional
assumptions. From the perspective of graphical models, a Gibbs sampler and stochastic EM algorithm for
M/M/1 FIFO queues are proposed to estimate the parameters of the queueing network from incomplete data.

The work in [Liu06] proposes instead service demand estimation from utilization and end-to-end response times:
the problem is formulated as quadratic optimization programs based on M/G/1/PS formulas; results are in good
agreement with experimental data.

The work in [Spi11] presents a thorough investigation of the state-of-the-art in resource demand estimation
technologies. Those technologies are analysed and compared in the same environment. By adjusting the
parameters of the environment the accuracy of the algorithms can be compared and possible directions for future
research can be obtained. The following table provides a classification of the approaches reviewed according to
the main techniques employed by each of them.

Overall, we may find that regression analysis tend to be the simplest the method for model parameterization. It
requires the assumption of the hidden relation between variables, such as linear relation. Kalman filter is suitable
for online model parameterisation because it is able to recursively adapt the parameters. However, this may
introduce significant overhead to the system. Therefore it is suitable to close the feedback loop in WP5 when
combined with layered queueing network, when no short-time execution is required. Machine learning
techniques have the advantage of no need for the knowledge of internal structure of the system. However, when
it comes to what-if analysis, machine learning cannot provide much useful information. Queueing-based
inference is able to provide useful insight into the system; however it requires the assumption of the queueing
distribution. The reviewed approaches are classified in Table 4.2.a, according to the techniques used.

Table 4.2.a Summary of QoS model parameterization methods
Method type References
Regression analysis [Rol95] [Rol98] [Zha07] [Kal11] [Kal12]
Kalman filter [Zhe05] [Zhe08] [Wu08]
Machine learning [Cre10] [Cre12] [Sha08] [Sut08]
Queueing-based inference [Liu06] [Sut08]
MODAClouds

th
2013 63

Above model parameterization methods can be selected and compared and then employed in MODAClouds. If
none of them is efficient enough, a novel approach may be developed. For QoS model parameterization, the
most important is the accuracy. Above methods demonstrate their effectiveness under different circumstances.
For example, the Kalman filter can be accurate but it requires recursive execution which lead to a strong
overhead. Another issue for runtime model parameterization is that this requires the approach to be conducted in
a short time period. Regression can be easily the fastest; however, it requires the assumption of the relation and
may lose accuracy if the real relation is different.

4.2.4. Workload forecasting methods
There are many approaches to predict future workload. These approaches require extensive profiling and log
data about the running system and then use the data to extract interesting information from the system with the
help of techniques such as machine learning and data mining.

Autoregressive methods, also known as Box-Jenkins algorithms, have been proposed to forecast workload time
series. In [Lu09] the Box-Jenkins algorithms are combined with simulation technologies to incorporate risk and
uncertainty analysis. [Ver07] proposes a hierarchical framework to both short-term and long-term web server
workload. The authors use Dynamic Harmonic Regression (DHR) to model the long-term workload and use an
autoregressive model to predict the short-term workload. The parameters of both methods are estimated using
Sequential Monte Carlo (SMC) algorithms. Experiments result show that the framework is robust to outliers and
non-stationary in the data.

An interesting approach is to combine autoregressive methods with machine learning techniques. For instance,
the work in [Zha03] proposes a forecasting technology combining both ARIMA and Artificial Neural Network
(ANN) models. This approach takes the advantages of both ARIMA and ANN in linear and nonlinear modelling
for time series data. Experiments with real data show that the hybrid model has an improved forecasting
performance compared to the models used separately. Also, [Pow05] explores several machine learning and data
mining algorithms, such as auto-regressive models, multivariate regression models, Bayesian network classifiers,
to predict the short term performance of enterprise systems. They treat it as a classification question as to
whether the system will meet the target performance objective in a short time period. Besides the accuracy of
different methods, they also characterize whether they are qualified to be stand alone tools in the real system. For
example, the model should adapt to different systems and workloads and can predict with incomplete data.
Moreover, the gain of accuracy should be more than the cost of the complexity of the model. Another example of
this is [Wu10], who proposes to use Kalman filter and Savitzky-Golay filter to predict grid performance. They
use a confidence windows approach to restrict the workload prediction in a certain tolerable range to avoid large
workload fluctuations. They also present an adaptive hybrid model to extend the classic auto-regression model to
take the confidence windows into consideration and adaptively improve the prediction accuracy. The authors use
real data to prove the effectiveness compared to existing workload forecasting technologies.

Other works based on machine learning methods include [Di12], where a workload prediction algorithm based
on Bayes model is proposed. The objective is to predict the long-term workload and the pattern of it. The authors
designed nine key features of the workload and use Bayesian classifier to estimate the posterior probability of
each feature. The experiments are based on a large dataset collected from a Google data center with thousands of
machines.

Non-bayesian machine learning methods have also received significant attention, as in [Wan05], where a web
traffic trend prediction model is proposed. The neuro-fuzzy model analyses the web log data and extracts the
useful information from it. The authors build a pattern analysis and fuzzy inference system to predict the chaotic
trend of both the short-term and long-term web traffic by the help of cluster information obtained from Self
Organising Map (SOM). Empirical results demonstrate the efficiency for predicting the future trend of web
traffic. Also, in in [Kha12], the authors propose a method to characterise and predict workload in cloud
environments in order to efficiently provision cloud resources. The authors developed a co-clustering algorithm
to find servers that have similar workload pattern. The pattern is found by studying the performance correlations
for applications on different servers. Then they presented a Hidden Markov Modelling (HMM) method to
identify the temporal correlations between different clusters and use this information to predict the workload
variation in future.
MODAClouds

th
2013 64

Methods based on trend and pattern recognition technologies are used in [Gma07] to propose a workload
demand prediction algorithm. The objective of this approach is to find a way to efficiently use the resource pool
to allocate the servers to different workloads. The pattern and trend of the workloads are first analysed and then
synthetic workloads are created to reflect the future behaviours of the workload. After obtaining the synthetic
workloads, how to place the workloads among different servers can be suggested so as to minimise the usage of
servers and balance workloads. A related approach is taken in [Hol10], where a periodicity detection approach is
proposed. The objective is to predict the workload changes in enterprise DBS which often exhibits periodic
patterns. Two methods for detecting periodicity pattern are proposed: the discrete Fourier transform method and
the interval analysis method. An algorithm is presented to relate the knowledge of periodic patterns with
workload changes.

Table 4.2.b presents a classification of the methods reviewed according to the underlying technique.

Table 4.2.b Summary of Workload forecasting methods
Autoregressive model [Lu09] [Zha03] [Ver07] [Pow05] [Wu10]
Regression model [Pow05] [Ver07]
Kalman filter [Wu10]
Machine Learning (Bayesian) [Pow05] [Ver07] [Di12]
Machine Learning (Non-Bayesian) [Zha03] [Wan05] [Kha12]
Pattern Analysis (Recognition) [Gma07] [Hol10]

4.3. Run-Time QoS Models
Common approaches used by system administrators to characterize the runtime execution of complex software
systems include direct measurement techniques, such bytecode instrumentation via aspect-oriented programming
[Mar12]. These monitoring approaches focus on acquiring extensive profiling and log data about the offered
QoS and then provide the ability to execute statistical analysis methods and data mining to extract interesting
information about the system in execution. While this procedure is in general very important to understand the
properties of a system at runtime, it does not per-se provide mechanisms to help reasoning on how such a system
could be optimized. Such mechanisms include, for example, the ability to condense such information into
mathematical models that could be integrated within numerical optimization programs in order to find the best
choice for a decision parameter. Another example is to determine the correlation between a request resource
consumption on a server and the resource consumption it requires on a different server. While footprinting
methods exist to track the identity of a transaction across a distributed system, they are not always adopted and
furthermore they do not allow to clearly map the resource consumption of a request across all the software and
hardware layers that contribute to its processing. Hence, statistical reasoning is needed to understand such
correlation from monitoring.

Several works have attempted to use statistical learning methods, such as classification, regression and adaptive
re-learning, to characterize a system in execution at runtime and operate predictions on its QoS. Others have
focused on the estimation and tracking of the system state by means of control theoretic methods such as Kalman
filters. Yet another set of works have adopted models that describe the inner structure of the system modelled
and/or the architecture of deployed software application. These models are typically queueing networks and
queueing layered networks. In the following sections we describe recent developments in each of these
directions.
Regarding the techniques, these are closely related to the ones presented in Section 4.2, including statistical
inference, control-theoretic and queueing-based methods. For a brief review of the main techniques mentioned in
the following sections we refer the reader to Section 4.2.3. . Furthermore, product form queueing networks and
layered queueing networks are used also for design time analyses. Research works focussing at design time are
MODAClouds

th
2013 65

discussed in MODAClouds deliverable D5.1 (Sections 5.2 and 7). Here the works focussing on run-time
problems are considered.

4.3.1. Statistical learning models
[Aga07] presents E2EProf, a toolkit capable of tracking the end-to-end behaviour of requests in a distributed
enterprise application, such as those that are commonly migrated to the cloud. The approach looks at network
packet traces to reconstruct non-intrusively the path of a high-level request across a distributed system. A time
series approach is utilized in which cross-correlation between events in the traces is used as a driver for inference
and establishing which software components have been utilized by a transaction. Such system has been reported
by the authors as applied to production systems.

The works in [Coh04, Coh05] illustrate a methodology to predict correlation among system states based on Tree-
Augmented Networks, an efficient class of Bayesian networks. Given a monitoring trace, the approach involves
defining an ensemble of models that is continuously learned. Such models attempt to describe the probabilistic
law that puts in relation a QoS metric (e.g., CPU utilization, memory consumption, etc) with an SLO state of
compliance (service objective achieved, service objective violated). Scoring methods are used to select the best
submodel in the ensemble to use to estimate the SLO state over a moving window. The predefined sample size
can be obtained from learning surface.

Recently, [Gam12] proposes runtime QoS models in which a controller maintains a Kriging model for each
target SLO. A Kriging model is a method to describe the correlation among errors in a prediction model, thus it
differs from regression methods which instead focus on providing a prediction given modelling assumption, not
a description of the resulting error. The Kriging approach is based on radial basis functions, which are data
interpolators used in pattern recognition since many years. Essentially, they can be useful to attack situations in
which errors are correlated, a circumstance which can be more problematic to handle for regression models.
Initial results of this approach indicate that Kriging models can lead to controllers delivering very low, even
negligible, SLO violation errors.

vPerfGuard [Xio13] is a new controller capable of automatically identifying predictive metrics for application
performance and adapt dynamically to changes in such metrics. Compared to other controllers, this approach
aims at identifying the most important metrics from prediction using a machine learning approach. Correlations
across metrics are considered for feature selection. Subsequently, modelling is performed using methods such as
linear regression, k-nearest neighbor (k-NN), regression trees, and boosting which are compared for their
predictive capabilities. Similar methods are adopted for example in [Shi06], [Coh04], [Des12] where a runtime
engine is proposed that features statistical learning, classification, regression, and adaptive re-learning.

IRONModel [The08] is a performance management system that maintains a modelling description of a
distributed system by dynamically analyzing its traces and discovering automatically new correlations between
performance metrics and system attributes. The model is built by system designer incorporated in the system.
The underlying modelling approach is based on zero-training classification and regression trees (Z-CART). The
underlying models rely in part on operational analysis and bound analysis laws developed in the context of
queueing theory, however the approach combines these formulas in a machine learning framework. Compared to
other approaches in this section, IRONModel also features active probing to accelerate training.

Reinforcement learning has also been proposed as a method to build run-time QoS models [Tes05]. Although
these methods may provide good results without specifying an underlying traffic model, they also require
significant online training, which can be very expensive in production systems. To prevent this, hybrid methods
[Tes06] have been considered, where the initial policy is provided by an analytic model, that is afterwards
improved by solutions found by a reinforcement algorithm trained offline using previously collected
information.

[Tan12] introduces PREPARE, an online anomaly prediction and virtualization-based prevention system. Its
anomaly detection module consists of a 2-state Markov model to predict the future values of relevant attributes,
and a tree-augmented Bayesian network model to classify those future states between normal and abnormal. In
addition, it provides a module to determine the faulty VMs causing the anomaly, as well as an actuator module
that perform preventive actions to avoid SLO violation states.
MODAClouds

th
2013 66
Many of the statistical learning methods mentioned so far, such as [Tan12,Coh05,Dua09] rely on labelled
training data, i.e. data from the production system that includes monitoring metrics and annotations of whether
the system is violating an SLO or not. As these data is not readily available in most systems, [Dea12] introduces
an unsupervised learning algorithm that is able to predict anomalies in a virtualized data centre, without the need
of training data. To this end, the authors rely on the Self Organizing Map method which can describe complex
system behaviours with a smaller computational cost than other unsupervised methods.
[Mal11] employs a multi-model for n-tier application control, where an empirical model learns the best decisions
for each possible configuration and workload. As initially the system under control has no logs to learn from, the
decisions are taken based on another model, in this case a Horizontal Scale Model. Once a decision is known for
a given configuration, the empirical model takes over and applies the decision already known as the best for that
configuration. Although proposed initially as relying of the Horizontal scale model, this meta-model can actually
operate with any of the models proposed in this or the following sections.

4.3.2. Control theory models
Some works have instead attempted to use modelling techniques based on control-theory, such as Kalman filters
[Kal09] and Linear parametrically varying (LPV) models [Tan10]. Control theory has also provided a framework
to analyse the behaviour of policies for autonomic control. [Dut10] uses this framework to analyse the challenges
of threshold-based and reinforcement learning approaches, considering aspects that affect the stability of an
autonomic system, such as the latency and power of the controller, and oscillations in the input variables.

Kalman filters have been applied to control resource consumption in runtime web applications in works such as
[Zhe05], [Kal09]. Here we discuss the underlying resource consumption models. [Zhe05] uses a modelling
methodology based on layered queueing models, which are reviewed in Section 4.3.4. Conversely, [Kal09]
illustrates the application of feedback-loop models used in control theory to distributed systems. It proposes
three Kalman filters to model the dynamics of a software application and applies them to the control problem
showing good accuracy. Such filters are respectively based on a Single Input Single Output (SISO) model
relating input workload and CPU utilization, Multiple Input Multiple Output (MIMO) model relating
covariances between VM utilizations, and an adaptive version of the latter that is self-configured.

LPV models are a class of control-theory methods that allows to describe the dynamics of a complex system in
terms of an input and a set of so-called scheduling variables that are variables describing the operational
condition of the system [Lee99], [Nem95], [Lov98], [Bam99], [Ver02]. An LPV model is linear in the
parameters and a vector of scheduling variables enters the system matrices in an affine or linear fractional way.
Single-inputsingle-output (SISO) and multiple-inputmultiple-output (MIMO) state space LPV models have
been both considered in the literature. For example LPV methods have been investigated in [Ver02], [Van09]
and their performance assessed on experimental data measured on a custom implementation of a workload
generator and micro-benchmarking Web service applications. The results show that the LPV framework is very
general, since it allows describing the performance of an IT system by exploiting all of the available technical
parameters to manage QoS.

[Tan10] introduces an LPV model to identify the dynamics of a web service, and defines an optimal control
problem based on this model. The solution of this optimal control problem is then used to define an optimal
policy to manage the trade-off that arises between the QoS guarantees and the energy consumption. In [Gia11]
the stability properties of an LPV-based proportional controller are analysed. The controller is designed for
admission control in web services and the LPV model is used to design the controller.

[Lim10] proposes a proportional threshold control for elastic storage in cloud platforms. The controller explicitly
considers the resources as discrete quantities, which is in line with per-instance pricing in platforms such as EC2.
The controller also considers the actuator lag generated by the delay of redistributing data to new storage servers.

Other approaches include the use of fuzzy logic [Xu07] to design a two-level controller for resource allocation in
a virtualized datacentre. Fuzzy logic difers from Boolean logic in that membership of element to a set is not
either 0 or 1, but can be any real in the interval [0,1]. With this generalization, [Xu07] proposes a model that
learns the relationship between workload and resource demand for a given QoS level. From this model inference
functions are derived that determine the appropriate resource allocation for a given workload.

MODAClouds

th
2013 67

4.3.3. Product-form queueing networks
Queueing networks have been among the first methods used for runtime control of software systems. Their
distinguishing feature compared to the models described above is their ability to consider white-box information
about a system in the runtime prediction. Often, this does not imply major limitations from a computational point
of view, since efficient iterative algorithms and fluid methods exist to approximate the solution of such models
in short amount of time. Recently, [Cas08] shows how such models can be applied to integrate a more realistic
description of the application workloads, including burstiness and fluctuations in the surrounding operating
environment (e.g., network bandwidth fluctuations in the cloud).

Early work in [Men03, Ben04], focuses on ecommerce sites and shows how queueing network models can be
used with combinatorial search techniques to determine an optimal system configuration. Periodic execution
allows adaptation at runtime. Variants of such models have been subsequently studied in works such as [Ben05,
Men07, Men05] in various application areas, including data centers. Urgaonkar et al. were able to validate a
basic product-form queueing networks for the Rubis and Rubbos [Rub] open-source benchmark multi-tier
applications [Urg05]. They also considered various non-product-form extentions to the model to better account
for several important features of their applications under study, e.g., an imbalance of load across multiple
application servers. Chen et al. represent the TPC-W [Tpc] and Rubis benchmark multi-tier applications as
multi-station queues where the multiplicity refers to the number of server processes in each tier [Che08]. They
use an approximation [Sei87] that transforms a multi-station queueing network models to an equivalent single-
station product-form queueing network models which can be solved using MVA. Lu et al. used simple product-
form models in conjunction with a feedback controller to perform runtime optimization of a single-tiered Apache
Web server system [Lu03]. [Zha07] presents a queueing network model where each queue represents a tier of a
web application, which is parameterized by means of a regression-based approximation of the CPU demand of
customer transactions. It is shown that such an approximation is effective for modeling different types of
workloads whose transaction mix changes over time.
4.3.4. Layered queueing networks
The main limitation of ordinary queueing network models is that they describe the resource consumption
mechanisms of the software, but they do not explicitly take into account known information about the software
architecture. Layered queueing models (LQM) [Rol95, Woo95] are an extension to queueing networks that
allows the representation of computer systems composed of several layers of software servers that share
hardware resources, and have therefore been extensively applied to software system research. LQMs were
developed starting in the 1980s to consider the performance impact of contention for software resources, e.g.
server threads, and the interactions between software entities at various system layers, e.g., messaging between
an application server and a database server. The approach decomposes an LQM into a hierarchy of queueing
networks models. Each model in the hierarchy is solved using approximate mean value analysis and the solution
process is repeated until the individual estimates of the models are all consistent with each other.

Approximate mean value analysis [Cha82, Cre02] is a technique that allows queueing network models to be
solved iteratively in a very efficient manner thereby permitting the study of larger systems and the solution of
models at runtime. However, the technique relies on product-form assumptions which restrict its applicability. In
particular, behaviour commonly observed in complex enterprise systems such as contention for software
resources, synchronous and asynchronous request-reply relationships between software entities, and priority
based resource access all violate product-form assumptions [Alt06].

As mentioned in Section 4.3.2, [Zhe05] uses a modelling methodology based on layered queueing models,
together with an extended Kalman filter for parameter estimation. They considered a time-varying web
application, which is modelled as an LQM due to the interdependencies of its components (web server,
database). Parameters such as the clients think time, the CPU and disk demands, vary with time and its values
are estimated by the Kalman filter. With these estimated values, the LQM is parameterized and (SLA-driven)
performance results obtained. These results can then be used by an autonomic control to make decisions
regarding resource allocation to prevent SLA violations. Other works in this area include [Lit05, Woo05].

[Jun09] presents a runtime adaptation engine that allows the automatic reconfiguration of multi-tier web
applications. The engine first evaluates the potential benefits of a reconfiguration based on an LQM and its
associated costs. Based on these the engine chooses the optimal sequence of reconfigurations to be applied on
the web application. The engine is evaluated with the RUBiS benchmark [Rub] and shows a significant
MODAClouds

th
2013 68
reduction in SLA violations. This has been extended in [Jun10] to consider power costs, including those caused
by the transient behaviour generated by the reconfiguration.
4.3.5. Summary
From the previous discussion, it is clear that the area of QoS runtime modeling has received significant interest
in the last decade. The contributions reviewed in this section are summarized in the following table. When
considering the different options available for Qos runtime models, it is important to mention that the selection
of the modeling technique is closely related to the information available to the performance analyst.

Statistical learning and most of the control theory-based methods assume a black-box approach, where little or
nothing is known about the application inner workings and architecture. As a result, these methods may be able
to capture the result of a reconfiguration that has been considered in the past, but may not be able to adequately
predict the performance implications of a completely new configuration. On the other hand, methods relying on
queueing networks and layered queueing networks consider more information about the specifics of the
application, and can therefore better predict the results of a new configuration. However, this additional
information may not always be available, especially for the owners of the cloud infrastructure, for whom the
application may indeed be a black box. Some methods, as [Zhe05], actually combine both approaches, using the
control theory models to parameterize layered queueing networks that describe the underlying architecture of the
application.

Table 4.3.a Summary of rum-time modelling methods
Statistical learning [Aga07] [Coh04] [Coh05] [Gam12] [Xio13] [SBC06] [CCGTS04] [DWSPV12] [The08]
[Tes05] [Tes06] [Tan12] [Dua09] [Dea12] [Mal11]
Control theory [Kal09] [Tan10] [Dut10] [Zhe05] [Son12] [Vaq08] [Gia11] [Lim10] [Xu07]
Queueing networks [Cas08] [Men03] [Ben04][Ben05] [Men07] [Men05] [Urg05] [Che08] [Sei87] [Lu03]
[Zha07]
Layered queueing
networks
[Cha82] [Cre02] [Alt06] [Zhe05] [Lit05] [Woo05] [Jun09] [Jun10]

MODAClouds

th
2013 69

4.4. SLA Management

Many solutions have been proposed for the management of Cloud services at run-time, each seeking to meet
application requirements while controlling the underlying infrastructure. Five main problem areas have been
considered in resource management policies design: 1) application/VM placement, 2) admission control, 3)
capacity allocation, and 4) load balancing.
The following discussion aims to figure out how these problems are addressed and to classify them according to
theoretical or applied criteria, conforming to the related research developed by the scientific community. Figure
4.4.a summarizes the classification criteria we adopt which will be examined in details in the next three
subsections. A similar approach is followed in the state-of-the-art review presented in Deliverable D5.1 that,
unlike this document focusing on run-time techniques, surveys the literature on design-time approaches to Cloud
related problems.

Figure 4.4.a. Classification criteria for SLA run-time management solutions.

4.4.1. Problem
The first category we consider is related to the problem the approaches aim to solve in the real world. Every
approach tries to achieve a certain goal in a specific context. As a first classification of the literature approaches
we consider the perspective, i.e., the actor optimizing the use of resources: Many proposals take the perspective
of the Cloud providers whose goal is to determine the optimal configuration of the underlying infrastructure in
order to satisfy incoming requests from the end-users while minimizing some cost metrics (e.g., energy). In the
opposite perspective the actor involved in resource management optimization is the Cloud end-user which
performs Cloud resource allocations according to application needs, minimizing the cost of use of Cloud
resources. This latter approach is the one that will be perceived within the MODAClouds project.
Most of the problems aim to minimize costs, others want to ensure high performance or high availability of the
system, some others to simultaneously guarantee these goals. Since the nature and the architecture of a system
are concepts difficult to be defined, it is useful to categorize some quantifiable quality attributes as performance,
cost, availability, reliability, safety, security or energy consumption.
Optimization
Approaches
Problem Solution Discipline
Quality
Attribute
Dimensionality
Constraint
Type
Decision
Variables

Optimization
Strategy
Constraint
Handling
Timescale
Type
Quality Model Perspective
Architecture
Representation
MODAClouds

th
2013 70
Furthermore, the set of optimized quality attributes can be aggregated into a single mathematical function or
decoupled into conflicting objective: the first one optimizes a single quality attribute only (single-objective
optimization, SOO), while the second optimizes multiple quality attributes at once (multi-objective optimization,
MOO). Often, for a nontrivial multi-objective optimization problem, there does not exist a single solution that
simultaneously optimizes each objective; in that case, the objective functions are said to be conflicting, and there
exists a (possibly infinite number of) Pareto optimal solutions. Some approaches encode priority criteria
following MOO into a single mathematical function (multi-objective weighted, MOW), others can even use
specifically designed functions.
Besides the dimensionality, each problem can be further characterized by the quality constraints that represent
additional attributes or other system properties. Constraints include structural constraints and performance
constraints as minimum throughput for the applications or available memory, limits on the overall resource
costs, fixed budget on the energy costs of the infrastructure, response time constraint. In some cases constraints
are not present.

4.4.2. Solution

The problems faced at run-time can be further analyzed on the basis of the solution category. We classify the
approaches according to how they achieve the optimization goal and thus describe the main steps of the
optimization process. First, solutions can be classified as centralized and distributed according to the framework
and to the interplay between the system factors; alternatively there are hierarchical solutions when the resources
are managed introducing multiple decision points (e.g., an high level controller assigns applications to clusters of
physical servers, while a second layer controller determines the optimal capacity allocation among applications
within the same cluster).

Within each problem, the solution is devised by the Decision Variables (DVs) available (e.g., provider selection,
application placement, capacity allocation, load-balancing, admission control). In other words, the DVs indicate
which changes of the system are considered as decision variables of the underlying optimization process.

Furthermore, approaches can be characterized according to the representation of the system under study. Firstly,
architecture representation classifies the solutions based on the information used to describe the problem
structure and configuration: according to the input required, there can be architectural model, UML (Unified
Modeling Language), ADL (Architecture Description Language) or optimization models (linear or non-linear).
Secondly, concerning with the solution technique, two main categories of optimization strategies can be pointed
out: those using exact methods or those guaranteeing approximate solutions. Among exact methods there can be
standard methods (as the branch-and-bound or dynamic programming) or problem-specific methods. Among
approximate ones, heuristic methods require problem or domain specific information to perform the search,
while meta-heuristic methods apply high-levels search strategies. The latter might exploit, for example, local
search, Evolutionary Algorithms such as Genetic Algorithms, Simulated Annealing or bio-inspired as for
example.
Another characteristic that differentiates the various searches and solution methods is constraint handling that
describes the used strategies to handle constraints. More precisely this category distinguishes if they are treated
as hard constraints or soft constraints with related penalties.
Finally, solutions are classified according to the time scale used which can range from a daily or hourly scale up
to the granularity of minutes, in some cases even seconds.

4.4.3. Discipline
Finally, the techniques used to solve these run-time service management problems advantage of various
disciplines which range from mathematics to computer science. Between the most used we find control theory
methods, machine learning and utility based methods consisting of combining performance models and
optimization models. For a detailed discussion and analysis of the disciplines see also [Ard12c, Ard08].
Furthermore, as orthogonal classification we can distinguish among pure optimization and game theory based
approaches. In pure optimization approaches a single actor optimizes, with various techniques and objectives,
his own goals without interacting with other actors. Vice versa, in game theory approaches the interaction across
different actors is non-negligible and, while perceiving his how goal, each actor (e.g., a cloud end-user) can be
affected by the actions of other actors (e.g., other end-users of the same cloud provider), not only by his own
actions.
MODAClouds

th
2013 71

4.4.4. State of the art

The next two sections present some of the most significant works that have been carried on in the last few years
for Cloud services SLA management. First pure optimization approaches are discussed (see Section 4.4.c), and
later game theory based solutions will be considered.

Pure optimization approaches

The literature has been reviewed according to the complex taxonomy depicted in Figure 4.4.a. Many categories
and sub-categories have been considered. Tables 4.4.a to 4.4.l represent a useful and direct way to partition the
state-of-the-art literature from a specific point of view.
In what follows a brief description of the most important works published in the last few years is presented. The
papers are grouped according to the Decision Variables category. Notice that although many works should
appear several times because they present many decision variables we only report them once. However, the other
decision variables are mentioned.

Provider Selection
The works listed below have in common the fact that the methods they propose consider the selection of a
different provider at run-time.
In [Dut12] authors show SmartScale, an autoscaling framework that uses a combination of vertical (adding more
resources to existing VM instances) and horizontal (adding more VM instances) scaling mechanisms together
with the selection of the most suitable provider. This method ensures that each application is scaled in order to
optimize both resource usage and the reconfiguration costs to pay due to the scaling process itself.
In a similar way, in [Xia12], an implementation of a system that provides automatic scaling for Internet
application is described. Each application is encapsulated in a single VM and the system scales up and down,
minimizing costs and energy consumption, maximizing the throughput, deciding also the application placement
and load distribution thanks to a color set algorithm.
Finally, [Xio11] addresses two challenges: the minimization of the total amount of resources while meeting the
end-to-end performance requirements for N-tier web applications. Open and closed workloads are considered as
input for an adaptive PI controller. A SLA-based control method leads to exact solution minimizing of the
average response time.

Application placement

The application placement together with the dynamic resource allocation problem is addressed and optimally
solved in [Had12] where a minimum cost maximum flow algorithm is proposed. The solution is based on a Bin-
Packing algorithm combined with a prediction mechanism.
An opportunistic scheduling approach, instead, is proposed in [He12], where parallel tasks are considered and
low-priority tasks are allocated to underutilized computation resources left by high-priority tasks. A model
representing tasks as ON/OFF Markov chains is presented.
In [Cap10], the SOS Cloud project is presented. The project aims at providing robust and scalable solutions for
service deployment and resource provisioning in a cloud infrastructure. The project has a double objective:
meeting the service level agreement and minimizing the required cloud resources. The algorithms developed
have the additional benefit to take advantage of the cloud elasticity, allocating and deallocating resources to help
the services to respect contractual SLAs.
Lastly, a bio-inspired cost minimization mechanism for data-intensive service provision is proposed in [Wan12].
The mechanism uses some bio-inspired concepts and mechanisms to manage data application services, to create
a large services cluster and to produce optimal composition solutions. The authors propose a multi-objective
genetic algorithm capable of returning a set of Pareto-optimal solutions.

Capacity allocation

As far as the capacity allocation decision variable is concerned, the literature is marred by works considering it
as part of the proposed solution.
In [Bjo12], for instance, the authors discuss an opportunistic service replication policy that leverages the VM
workload and performance variability, as well as on-demand billing pricing models to ensure response time
constraints, while achieving a target system utilization for the underlying resources.
MODAClouds

th
2013 72
Alternatively, one can mention [Gou11] where a Force-directed Resource Assignment (FRA) heuristic is used to
optimize the total expected profit obtained from processing, memory and communications resources. Moreover,
the results of the proposed approach are compared with those attained by relaxing the capacity constraint which
represent upper-bounds for the original problem.
Furthermore, in [Zam12] the authors show the todays limitations for Cloud Computing providers in allocating
their VMs with off-line mechanisms based on fixed-prices or auctions. Improvements have been demonstrated
by implementing an on-line mechanism that aims at maximizing the profit of each provider.
A model for applying revenue management to on-demand IT services has been presented in [Liu10]. The model
uses a nonlinear objective function to determine the optimal price over different system capacity and multiple
classes with different SLAs.
In [Lin12] a branch-and-bound approach together with an adjusting recursive procedure are proposed to evaluate
and maximize the reliability of a computer network in a Cloud Computing environment; the algorithm devised as
solution considers budget, time and stochastic capacity constraints.
Similarly, the problem of minimizing the use of resources and meeting, at the same time, performance
requirements under a certain financial budget and time constraints, has been investigated in [Tia11] for
MapReduce applications.

Load Balancing
In [Ard12b] the authors take the perspective of a Web service provider which offers multiple transactional Web
Services. They provide a non-linear model of the capacity allocation and load redirection of multiple classes of
request problems which are solved with decomposition techniques, exploiting predictive models of the incoming
workload at each physical site. A heuristic solution method for the same problem is, instead, presented in
[Ard11b].
The decentralized load balancing problem, as opposed to the traditional centralized version, has also been the
subject of recent works. [Ala09] proposes a decentralized load-balancing mechanism that considers
heterogeneous resources. The server state information is exchanged as to minimize the communication overhead
required by a decentralized approach. A bio-inspired algorithm for the load balancing problem is also discussed
in [Val11]. It is investigated an alternative for a decentralized service network, based on an unstructured overlay
network, in which the nodes that host instances of many different service types self-organize into virtual clusters.
The authors present a framework focusing on the load balancing problem, because nodes must be able to
efficiently balance the incoming requests among themselves. The proposed approach combines and exploits the
synergies between the clustering technique and superpeer topologies. Moreover, it inherits the typical benefits of
bio-inspired self-organization, such as the scalability with respect to the number of peers, and the dynamism and
robustness respect to unexpected behaviour.

Admission control

In [Wu12], cost-effective admission control and scheduling algorithms for SaaS providers are proposed in order
to maximize profits while improving customer satisfaction level.
In [Kon12], instead, a probabilistic approach aims to test admission control and to find the optimal allocation of
VMs on physical servers; the multi-objective weighted function incorporates business rules in terms of trust and
cost, and it is associated to constraints representing real factors that compromise the Cloud services, including
the provider selection, the variable number of users in time and different workload patterns.

Game Theory approaches

Game theory has found its applications in numerous fields such as Economics, Social Science, Political Science,
Evolutionary Biology. Over the last years this branch of applied mathematics has found its applications also in
problems arising in the ICT industry. For example, resource or QoS allocation problems, pricing and load
shedding, cannot always be handled with classical pure optimization approaches. Indeed, in a general complex
system interaction across different players is non-negligible: Each player can be affected by the actions of all
players, not only by his own actions. Non-cooperative Game Theory tools can reproduce perfectly this aspect. In
this setting, a natural modeling framework involves seeking an equilibrium, or stable operating point for the
system.
More precisely, non-cooperative Game Theory is the study of problems of conflict and cooperation among
multiple independent decision-makers, which means the study of the ways in which strategic interactions among
economic agents produce outcomes with respect to the preferences (or utilities) of those agents, where the
outcomes in question might have been intended by none of the agents. Each agent pursues his own interests
MODAClouds

th
2013 73

working independently and without assuming anything about what other players are doing. Moreover, he has to
follow certain rules while making his choices and each agent is supposed to behave rationally.
In the language of Game Theory rationality implies that every player is motivated by maximizing his own utility
(or payoff) irrespective to what other players are doing.
Given a game, which strategies will the rational players adopt? Intuitively, a player pursues the case in which his
payoff is maximized. Since the payoff function depends even on the strategies of the other players, which in turn
are maximizing their own payoff, a conflict situation is created and it is not easy to characterize the best choice
for every player. In other words, when rational players correctly forecast the strategies of their opponents they
are not merely playing best responses to their beliefs about their opponents play; they are playing best responses
to the actual play of their opponents. Indeed, the notion of a solution is more tenuous in game theory than in
other fields; it concerns with optimality, feasibility and equilibria.
In the fifties a solution concept - due to John Forbes Nash, see [74] - emerged as the most appropriate and
effective. When all players correctly forecast their opponents strategies, and play best responses to these
forecasts, the resulting strategy profile is a Nash equilibrium.
Formally, a non-cooperative game " in strategic form is a tuple {N, {X
i
}
i"N
, {#
i
}
i"N
}that consists of:
a finite set of players N $ {1,2,...,n}, where n " N;
a set of strategies X
i
for every player i " N, which is also called feasible set for player i;
payoff functions, #
i
:X
1
%X
2
%&&&%X
n
' R for each player i"N.
Moreover, we indicate with X: X $X
1
%X
2
%&&&%X
n
!R
M
the common strategy set, called feasible set or strategy
space of game "; every point x " X represents the feasible strategies of the game. Let us denote with x
!i
the set
of all the players variables, except the i-nth one: x
(i
$(x
1
,x
2
,...,x
i(1
,x
i+1
,...,x
n
) so we can write x = (x
i
, x
(i
). A
vector x " X is called a Nash equilibrium for the game if:
#
i
(x) ) #
i
(x
i
, x
(i
), #x
i
" X
i

Equivalently, x is a Nash equilibrium if and only if x
i
solves the maximization problem:
max #
i
(x
i
,x
(i
), s.t. x
i
" X
i

for all i " N, i.e., if and only if no player can improve his payoff function by unilaterally changing his
strategy.

Many approaches have been used to represent, model and manage Cloud services at run-time through Game
Theory tools. In [Fen11] authors present a methodical in-depth game theory study on price competition, moving
progressively from a monopoly market to a duopoly market, and finally to an oligopoly Cloud market. They
characterize the nature of non-cooperative competition in a Cloud market with multiple competing Cloud service
providers, derive algorithms that represent the influence of resource capacity and operating costs on the solution
and they prove the existence of a Nash equilibrium. On the dynamics of the market, a model of competitive
equilibrium in e-commerce to solve the problem of pricing and outsourcing can be found in [Dub07]; here the
analysis of pricing choices and decisions to outsource IT capability leads to a representation of the Internet
competition and extracts the maximum profit solution. Studies of the maximization of the social welfare as a
long-term social utility are discussed in [Men11]. Considering relevant queuing aspects in a centralized setting,
under appropriate convexity assumptions on the operating costs and individual utilities, the work established
existence and uniqueness of the social optimum. Furthermore, other studies based on a non-cooperative game
theory, are presented in [Wan12] where authors employ a bidding model to solve the resource allocation problem
in virtualized servers with multiple instances competing for resources. A unique equilibrium point is obtained. A
similar discussion can be found in [Wei10] where a QoS constrained parallel tasks resource allocation problem is
considered.

[Abh12] considers two simple pricing schemes for selling Cloud instances and studies the trade-off between
them. Exploiting Bayes Nash equilibrium the authors provide theoretical and simulation based evidence
suggesting that fixed prices generate a higher expected revenue than hybrid systems.

Using Bellman equations and a dynamic bidding policy, in [Zaf12], an optimal strategy under a Markov spot
price evaluation is found in order to complete jobs with deadline and availability constraints. The performance of
the model is evaluated by considering uniformly distributed spot prices and EC2 spot prices. Another work
regarding on-spot bidding is proposed in [Son12]. Authors propose a profit aware dynamic bidding algorithm,
MODAClouds

th
2013 74
which observes the current spot price and selects bids adaptively to maximize the average profit of a Cloud
service broker while minimize its costs in a spot instance market.
Finally, a Generalized Nash game for service provisioning problem have been formulated in [Ard12] and
[Ard11] where the perspective of SaaS providers hosting their applications at an IaaS provider is taken. Each
SaaS needs to comply with end user applications SLA and at the same time maximize its own revenue while
minimizing the cost of use of resource supplied by the IaaS. On the other end, the IaaS wants to maximize the
revenues obtained providing on spot resources.

4.4.5. Summary tables
A summary of classification proposed here is reported in Tables 4.4.a to 4.4.l.
Tables 4.4.a, 4.4.b, 4.4.d and 4.4.d relate to the Problem category while the Solution category is detailed through
tables 4.4.e to 4.4.j. Finally, Tables 4.4.k and 4.4.l represent the Discipline category following the classification
depicted in Figure 4.4.a.

Problem
The first table, Table 4.4.a, represents the partitioning of the reviewed literature according to the Perspective
sub-category. Each piece of literature can face a specific problem from two distinct points of view, focusing
either on the Cloud provider or on the Cloud end-user.
The Quality Attributes are summarized in Table 4.4.b. Four specific attributes are considered (Performance,
Cost, Availability and Reliability). Other, less common attributes are grouped under the label of Others. As is
clearly shown, the vast majority of the reviewed papers deals with the Performance, Cost and Availability
attributes.
Table 4.4.c, instead, considers the Dimensionality sub-category. It classifies the considered approaches in single-
objective (SOO) and multi-objective (MOO). In this case, one can see that the methodologies presented in
literature mainly belong to the single objective approach.
Finally, the considered taxonomy categorizes the Constraint sub-category into 5 possible attributes (Table 4.4.d)
the constraints considered by the state-of-the-art works, namely Cost, Performance, Availability, Throughput and
Memory usage. The literature is almost evenly distributed among these attributes.

Solution
The type of approach (Centralized, Distributed or Hierarchical) implemented is one of the fundamental
characteristics of a solution. Table 4.4.e details how the reviewed papers are subdivided according to this
attribute. Notice that the vast majority of them show a distributed architecture.
Table 4.4.f reports about the Decision Variables (DVs) exploited by the various solution methods in order to
effectively explore the design space. A DV is the set of possible actions that can be taken upon a current design
alternative in order to create new alternatives with, possibly, higher quality. It can be easily noticed that most of
the literature leverages the Capacity Allocation as DV.
The Architecture representation is exposed in Table 4.4.g. Clearly the state-of-the-art solutions prefer
Optimization over Architecture based models.
As far as the Optimization strategy is concerned, the proposed techniques are grouped into two main categories:
Exact methods and Meta-heuristics. Table 4.4.h demonstrates that the literature is evenly distributed between
those two approaches.
Finally, only few papers include information about the Time scale and Constraint handling approach. They are
reported and classified in Tables 4.4.i and 4.4.j.

Discipline
The last two of tables to present, namely Tables 4.4.k and 4.4.l, regroup the considered researches with respect to
their Discipline. A discipline is fully described by means of a certain Type and Quality model. Table 4.4.k faces
the type sub-category. Three typology are considered: Utility based, Control theory and Bio-inspired. The Utility
based and Bio-inspired approaches are dominant whereas only three works fall within the Control theory field.
MODAClouds

th
2013 75

In Table 4.4.l, instead, are reported the references to the considered works according to the underlying Quality
model.

Table 4.4.a: Problem category: perspective.

Quality Attributes
Performance Cost Availability Reliability Others
[Gou11], [Tia11],
[Bjo12], [Wu12],
[Xia12], [Zaf12],
[Cap10], [Ard12b],
[Ard11b];

[Liu10], [Zaf12],
[Son12], [Men11],
[Had12], [Dub07],
[Fen11], [Wan12],
[Kon12], [Ard12b],
[Ard11b];
[Wei10],[Zam12],
[Dut12], [Had12],
[HE12], [Val11];
[LinC12],
[Xio11];

[Ala09],
[Sri08],
[Maz12],
[Dou12],
[Cap10];
Table 4.4.b: Problem category: quality attributes.

Dimensionality
Single-objective optimization Multi-objective optimization
[Gou11], [Tia11], [Bjo12], [Fen11], [Liu10], [Zaf12],
[Son12], [Men11], [Had12], [Wei10], [Zam12],
[Dut12], [HE12]; [LinC12], [Xio11], [Dou12], [Sri08],
[Wan12], [Cap10], [Kon12], [Ard12b], [Ard11b],
[Ala09];
[Wu12], [Xia12], [Dub07], [Meh12], [Maz12];
Table 4.4.c: Problem category: dimensionality.

Constraints
Cost Performance Availability Throughput Memory
[Tia11],
[Liu10], [Dub07],
[Son12], [Fen11],
[Wei10], [LinC12],
[Ard12b],
[Ard11b];
[Wu12], [Gou11],
[Tia11], [Fen11],
[Had12], [HE12],
[LinC12], [Dou12],
[Ard12b], [Ard11b];

[Zaf12], [Wei10],
[Zam12],
[Had12];
[Son12], [Dut12]; [Gou11], [Liu10],
[Meh12], [Had12],
[Xio11];
Table 4.4.d: Problem category: constraints.

Type
Centralized Distributed Hierarchical
[Bjo12], [Liu10], [Men11], [Tia11], [Dub07], [Meh12], [Wan12];
Perspective
Cloud provider Cloud end-user
[Xia12], [Dut12], [Dou12], [LinC12], [Sri08],
[Maz12], [Kon12], [Zam12], [Xio11], [NeeV11],
[Gou11], [Bjo12], [Liu10], [Wu12], [Fen11],
[Men11], [WanDJ12], [Abh12], [Ard11], [Ard12],
[Ala09];
[Tia11], [Had12], [HE12], [Zaf12], [Son12],
[Ard11], [Ard12], [Cap10], [Ard12b], [Ard11b];

MODAClouds

th
2013 76
[Meh12]; [Wei10], [Sri08], [Cap10], [Val11],
[Ard12b], [Ard11b], [Ala09];

Table 4.4.e: Solution category: type.

Degrees of freedom
Provider
selection
Application
placement
Capacity allocation Load
balancing
Admission control
[Xia12],
[Dut12], [Xio11],
[Kon12],

[Xia12],
[Had12],
[HE12],
[Sri08],
[Cap10],
[Kon12],
[Wan12], [Ala09];

[Wu12],[Gou11],
[Tia11],[Bjo12],
[Liu10],[Zaf12],
[Son12],[Men11],
[Fen11], [Had12],
[Wei10],[Zam12],
[LinC12],[Xio11],
[Dou12], [Maz12],
[Sri08],[Ard12b],
[Ard11b];
[Xia12],
[Val11],
[Ard12b],
[Ard11b],
[Ala09];

[Wu12],
[Kon12],
[Ala09];
Table 4.4.f: Solution category: degrees of freedom.

Architecture representation
Architecture models Optimization model
[Xia12]; [Gou11], [Tia11], [Wu12], [Fen11], [Liu10], [Men11],
[Zam12], [Xio11], [Dut12], [Dou12],
[Wan12],[Kon12], [Ard12b], [Ard11b], [Ala09];
Table 4.4.g: Solution category: architecture representation.

Optimization strategy
Exact Meta-heuristic
[Bjo12], [Liu10], [Zaf12], [Son12], [Men11],
[Fen11], [Had12], [Wei10], [Had12],[Xio11],
[Dou12], [Ard11b], [Ala09];
[Wu12],[Gou11], [Wei10], [HE12],[LinC12],
[Maz12], [Sri08], [Wan12],[Ard12b];
Table 4.4.h: Solution category: optimization strategy.

Constraints handling
Not present Hard Penalty
[Dub07]; [Fen11],[HE12],[Ard12b],[Ard11b]; [Liu10];
Table 4.4.i: Solution category: constraints handling.

Time scale
Minute Hour Day
[Wu12], [Ard12b], [Ard11b]; [Maz12], [Ard12b], [Ard11b]; [Fen11];
Table 4.4.j: Solution category: time scale.

Type
MODAClouds

th
2013 77

Utility based Control theory Bio-inspired
[Gou11], [Tia11], [Dub07],
[Men11], [Fen11], [Wu12], [Wei10],
[Ard12b], [Ard11b], [Ala09];
[Zaf12], [Had12], [Kon12], [Xio11]; [Cap10] ,
[Val11], [Wan12];

Table 4.4.k: Discipline category: type.

Quality Model
Markov chain Queuing network State based model
[Zaf12], [Son12]; [Gou11], [Tia11], [Dub07],
[Men11], [Fen11], [Wu12],
[Wei10], [Ard12b], [Ard11b],
[Ala09];
[Wu12];
Table 4.4.l: Discipline category: quality model.

4.4.6. Criteria for evaluation

In order to assess the quality of the solution methods proposed in the literature several evaluation criteria can be
considered. Given the run-time constraints, the time required to find a solution or the maximum size of the
problem instance that can be solved in a given time horizon need to be considered. These measures depend on
practical and physical limitations, specific application under study, industrys aim or researchs purpose as well
on the chances, tools and resources available.

Another important evaluation criterion is scalability as the ability of the solution method to handle problems of
growing size or its ability to enlarge the optimization scope (e.g., adding additional quality metrics or
constraints).
Another important aspect is the accuracy which can be achieved by the underlying quality evaluation model, that
is the accuracy that can be achieved comparing the QoS metrics evaluated through the QoS model with the real
figures measured in the real system.
In [Sri08] four simulations are compared, maintaining constant the number of applications but varying disk and
CPU utilizations, showing that the energy used by the proposed heuristic is about 5.4% above the optimal value
on an average 20% tolerance. No information about scalability is reported.
To evaluate the scalability of the resource allocation algorithm they proposed, in [Ard11] the authors considered
a very large set of randomly generated instances. Such instances have been created varying the number of SaaS
providers between 10 and 100 while the number of applications between 1000 and 10000. They showed that the
problem can be solved, in the worst case, in less than twenty minutes.
In [Dut12] the authors varied the number of servers in an emulated data center and observe the performance,
demonstrating that the total cost for their approach increases linearly with the number of servers. They also
demonstrated that the running time is statistically independent from the number of servers.
A large-scale simulation demonstrates that the algorithm presented in [Xia12] is extremely scalable: the decision
time remains under 4 seconds for a system with 10000 servers and 10000 applications.
In [Had12] is reported a complete scalability study. The deviation from the optimal value is shown to be
consistently small and tends to zero as the number of physical machines (PMs) increases. This means that the
proposed algorithm is capable to find solutions very close to the optimal for a large number of PMs and for a big
Cloud provider with many data centers.
The proposed method scales much better than common Bin-Packing algorithms which encounter scalability
problems and take longer times to find the optimal solution for the problem.
Finally, in [Ard12] the inefficiency of the two algorithms presented is calculated in terms of Price of Anarchy
(PoA) and Individual Worst Case (IWC). A very large number of randomly generated instances is considered,
the number of SaaS providers varies between 10 and 100, while the number of applications between 100 and
1000. Furthermore, the article focused on the scalability arguing that the algorithms scale linearly with the
cardinality of the set of SaaS.

MODAClouds

th
2013 78
5. MODAClouds Run-Time Platform
5.1. Overview
The aim of this section is to define the requirements for the MODAClouds runtime platform that will be
developed in the project. After the introduction to the overall goals of the runtime environment, the general
approach and the high-level conceptual architecture described in Section 1, we define the actors (Section 5.1.1)
that are referenced in the requirements specifications (Sections 5.2-5.5) and the requirement sets (Section 5.1.2).
The requirements elicitation methodology that we have adopted is overviewed in Section 5.1.3. Finally, Section
5.6 provides a roadmap for WP6, focusing in particular on Year 1 of the project.
5.1.1. Actors
In this section, we consider the three platforms defined in the conceptual architecture as actors included in the
requirements specifications. In additions to these, we consider in the requirements specifications a set of common
actors that are referenced also in the other WPs requirements specifications:
Cloud app developer: A developer who designs, implements, and tests cloud-based applications.
Cloud app: the cloud application developed by the Cloud app developer using the MODAClouds IDE.
Application cloud: the cloud platform where the Cloud app is (or will be or was) running.
Service cloud: the cloud platform where the runtime services offered by the runtime platform are (or
will be or were) running. A service is not part of the Cloud app, rather it is part of the execution
platform (e.g., discovery service).
MODAClouds IDE: this is the envisioned technical output of WP4 and WP5, a design-time
environment that will implement the MODACloudML language and that will provide the application
code and the initial deployment decisions that are needed by the runtime platform to instantiate the
application.
Cloud app admin: An administrator who configures, deploys, operates, and tests cloud-based
applications on cloud platforms.
Cloud app provider: A provider that provides cloud-based applications.
QoS engineer: An engineer who specifies quality of service (QoS) constraints and alternatives for
design time exploration and run-time adaptation.

Throughout the requirement elicitation, we use the notation <A>to indicate actor A, e.g., <Cloud app admin>.
Furthermore, we refer generically to QoS constraints to mean any hard or constraints regarding QoS (e.g.,
imposed by an SLA) and specified in the MODAClouds IDE.

5.1.2. Requirement Sets
In the following sections, we describe the requirements for the runtime platform. The requirements have been
grouped into four categories inspired by the conceptual architecture. The main distinction from the conceptual
architecture mapping is that the requirements for the monitoring platform are distinguished into two further sets:
monitoring requirements and analysis requirements. The former set mainly deals with monitoring data collection
and distribution, while the latter set emphasizes the analysis of the acquired monitoring data to extract
knowledge.
The sets of requirement elicited in the rest of this section are as follows:

Execution Requirements (Section 5.2): this group provides requirements for application deployment,
initial testing, execution, and runtime management. Management functionalities include runtime
services (e.g., discovery, logging, application health controllers) and data management (archival and
synchronization).
Monitoring Requirements (Section 5.3): this group provides requirements for the part of the
monitoring platform that will deal with data collection, preprocessing, distribution and consumption by
means of monitoring data observers.
Analysis Requirements (Section 5.4): this group provides a list of requirements for the data analysis
part. These requirements deal with high-level aggregation and processing of the monitoring data and
characterize the analysis step of the MAPE-K loop.
Self-adaptivity Requirements (Section 5.5): this group provides requirements for the subsystems that
will implement the runtime models and runtime policies developed in WP6.

MODAClouds

th
2013 79

5.1.3. Requirement Elicitation Methodology
For each group of requirements, we use the guidelines provided in D3.1.1 to define use case scenarios. For the
sake of readability, unused entries in tables are omitted. Furthermore, qualitative requirements that provide more
details about a use case and the environment with which it interacts are provided in the Other requirements
subsection. These additional requirements also form necessary requirements for the WP6 runtime architecture.
To help readability, we express these Other requirements using the keywords proposed in the Internet
Engineering Task Force RFC 2119 which are here briefly summarized and related to the Priority of
accomplishment keywords indicated in D3.1.1 (i.e., Must/Should/Could have):

"MUST"/"MUST NOT"/"REQUIRED"/"SHALL"/"SHALL NOT": equivalent expressions to indicate
Must have priority of accomplishment.
"SHOULD"/"SHOULD NOT"/"RECOMMENDED"/NOT RECOMMENDED: equivalent keywords
to indicate Should have priority of accomplishment.
"MAY"/"OPTIONAL": equivalent expressions to indicate Could have priority of accomplishment.

We point to http://www.ietf.org/rfc/rfc2119.txt for further details.

MODAClouds

th
2013 80
5.2. Execution Requirements
5.2.1. Context and System Overview
5.2.1.1. Context
Use case template Description
Category name Execution

The scope of the following use case specification is to elicit requirements for application deployment and
execution. The scope of this in the context of the MODAClouds reference architecture falls primarily within the
Execution Platform.

5.2.1.2. System Boundary Model

Figure 5.2.a: Execution Requirements

5.2.2. Use case specification for the Run Application use case

Use case heading Description
Use case name Run application
Use case ID UC-MC.wp6.Execution.Run Application.-V01
Priority of accomplishment Must Have

Use case
description
Description
Use case diagram See the Run application use case of the system boundary model in Section 5.2.1.2.
Goal The goal of the Run application use case is to start, stop, status query, and manage a
<Cloud app> instance on the <Application Cloud>.
Main Actors
<Cloud app admin>
<Cloud app>
<MODAClouds IDE>

Use case scenarios Description
MODAClouds

th
2013 81

Main success
scenarios
1. The <MODAClouds IDE> requests to start or stop <Cloud app>. Alternatively,
the <Cloud app admin> requests via a web-based UI to start or stop <Cloud
app>. The <Execution Platform> starts or stops automatically the application
on the target <Application Cloud>.
2. The <Cloud app admin> requests via a web-based UI to the <Execution
Platform> to view the configuration and the logs of the running <Cloud app>
3. The <Execution Platform> feeds back information to the caller about the status
of the application.
Preconditions

1. The application is compliant with the restrictions of the <Application Cloud>,
and uses the appropriate API's, packaging, etc.
Postconditions

1. The application has been successfully deployed on the <Application Cloud>

Other requirements:
1. An instance of the <Execution Platform> can start or stop a single instance of <Cloud app> and be
deployed on a single <Application cloud>. Therefore, separate <Cloud app> MUST have separate
<Execution Platform>.
2. The <Execution Platform> and the <Cloud app> MUST be treated as independent software artifacts.
They MAY run on different clouds, preferably within (network) topological proximity to reduce
latency. Therefore, they SHOULD rely as much as possible on services and protocols that can operate
in any cloud environment (e.g., HTTP-based RESTful services).

5.2.3. Use case specification for the Deploy Application use case

Use case name Deploy Application
Use case ID UC-MC.wp6.Execution.Deploy Application.-V01

Use case description Description
Use case diagram See the Deploy application use case of the system boundary model in Section 5.2.1.2.
Goal To deploy the packaged <Cloud app> to the targeted <Application cloud>
Main Actors
< Cloud app >
<Cloud app admin>
<MODAClouds IDE>

MODAClouds

th
2013 82
Main success
scenarios

1. The <Execution Platform> is instructed by the <MODAClouds IDE>, or by
<Cloud app admin>, to deploy the application.
2. The <Execution Platform> will provision the required resources from the
<Application cloud> on behalf of the <Application administrator>
3. The <Execution Platform> will then deploy all the needed software artifacts to
run the <Cloud app>, which includes the <Cloud app> itself and other
MODAClouds Services needed for the application.
Preconditions

1. The <Cloud app> was packaged properly for the <Application cloud>
2. The <Cloud app admin> has the proper credentials to access the <Application
cloud>.
3. The <Cloud app admin> has delegated the credentials to the <Execution
Platform>
Postconditions

1. The <Cloud app> has been successfully deployed on the <Application cloud>

Other requirements:
Deploying the application includes deploying the necessary <MODAClouds services>.

5.2.4. Use case specification for the Start/Stop Application Sandbox use case

Use case name Start/Stop Application Sandbox
Use case ID UC-MC.wp6.Execution. Start/Stop Application Sandbox.-V01
Priority of accomplishment Should Have

Use case diagram See the Start/stop application sandbox use case of the system boundary model in
Section 5.2.1.2.
Goal Start/stop the application in a controlled container for the purpose of application
testing or calibration of the services and their internal data structures (e.g., runtime
models).
Main Actors <Cloud app>
<MODAClouds IDE>
<Cloud app admin>

Main success scenarios 1. <Cloud app admin> or <MODAClouds IDE> requests to the <Execution
MODAClouds

th
2013 83

Platform> to create a sandbox environment for <Cloud App>.
2. <Execution Platform> creates a sandbox environment and configures the
services for executing in this special environment.
Postconditions
1. <Execution Platform> accepts the same requests as in a normal
environment (e.g., Deploy Application, etc) but these are all performed
in a sandbox environment.

Use case specification for the Synchronise Application Data use case

Use case name Synchronise Application Data
Use case ID UC-MC.wp6.Execution.Synchronise Application Data.-V01

Use case
description
Description
Use case diagram See the Synchronise Application Data use case of the system boundary model in Section
5.2.1.2.
Goal To offer to the Cloud app designer or QoS engineer to specify data replica and the
synchronization requirements for them
Actors <QoS engineer>
<Cloud app developer>

Main success
scenarios
1. The QoS engineer or Cloud app developer selects, for a portion of the database
or for the whole database, the synchronization requirements between replicas.
These can be: consistent or eventual consistent
2. The execution platform examines the deployment configuration of the database
and creates and activates the proper synchronization connectors between the
replicas
Preconditions The application has been already deployed on the execution platform
Postconditions The execution platform is ready to keep the data replicas synchronized according to the
selected synchronization requirements

Other requirements:
1. The execution platform MAY offer the possibility to check that the synchronization choices made by
the user are consistent with the way the system is deployed
2. The execution platform MAY offer the possibility to change the synchronization requirements
dynamically

MODAClouds

th
2013 84
5.3. Monitoring Requirements
5.3.1.1. Context

Category name Monitoring

The scope of the following use case specifications is to detail the main functionalities offered by the monitoring
platform.

5.3.1.2. System boundary model

Figure 5.3.a: Monitoring Requirements

5.3.2. Use case specification for the Install Monitoring Rule use case

Use case name Install Monitoring Rule
Use case ID UC-MC.wp6.Monitoring.Install Monitoring Rule.-V01

Use case
description
Description
MODAClouds

th
2013 85

Use case
diagram
See the Install Monitoring Rule use case of the system boundary model in Section 5.3.1.2.
Goal Monitoring rules are produced at design time by WP5 and define the object to be monitored,
which measures should be gathered, the time window in which monitoring should happen,
the frequency of monitoring. The goal of this use case is to allow new rules to be installed in
the monitoring platform.
Main Actors <MODAClouds IDE>
<Cloud app admin>

Main success
scenarios
1. The Cloud app developer, through the <MODACloudsIDE> or the <Cloud app
admin>, through a direct interface, requests the installation of a new rule.
2. The <Monitoring Platform> checks that the rule has not been previously installed.
3. If the previous check is successful, then the <Monitoring Platform> installs the
rule and put it in the state Inactive.
Preconditions The <Monitoring Platform> is ready to start its service
Postconditions The monitoring rule is properly installed in the monitoring platform

Other requirements:
1. The <Monitoring Platform> MUST allow for installation of new monitoring rules before monitoring
starts
2. The <Monitoring Platform> MUST allow for the installation of multiple monitoring rules
3. The Monitoring Platform, upon installation of a monitoring rule, MAY check that it can be actually
executed in the current <Monitoring Platform> configuration, i.e., the associated data can be gathered
and the corresponding computations/filtering/compositions can be executed
4. The <Monitoring Platform> MAY allow for installation of new monitoring rules during its execution

5.3.3. Use Case Specification for the Activate/Deactivate Monitoring Rule use case

Use case name Activate/Deactivate Monitoring Rule
Use case ID UC-MC.wp6.Monitoring.Activate/Deactivate Monitoring Rule.-V01

MODAClouds

th
2013 86
Use case
description
Description
Use case
diagram
See the Activate/Deactivate Monitoring Rule use case of the system boundary model in
Section 5.3.1.2.
Goal Monitoring rules are produced at design time by WP5 and define the object to be monitored,
which measures should be gathered, the time window in which monitoring should happen,
the frequency of monitoring. The goal of this use case is to activate a monitoring rule
already installed in the <Monitoring Platform> or to deactivate an activated one.
Main Actors <Cloud app admin>

Main success
scenarios
Activation scenario
1. The <Cloud app admin> through a specific user interface requests the
activation of a rule that is installed and in the Inactive state
2. The <Monitoring Platform> checks that it can collect the requirement
measures based on its current internal configuration

Deactivation scenario
1. The <Cloud app admin> through a specific user interface requests the
deactivation of a rule that is in the Active state
2. The <Monitoring Platform> stops the execution of the monitoring rule and
puts it in the Inactivestate.
3. If the deactivated rule is the last one in the monitoring platform, then this last
one stops collecting monitoring data.
Preconditions The <Monitoring Platform> is ready to start its service or it is already running
Postconditions Upon activation of a monitoring rule, the <Monitoring Platform> starts executing it
Upon deactivation of a monitoring rule, the <Monitoring Platform> stops executing it

Other requirements:
1. The <Monitoring Platform> MUST allow for activation and deactivation of monitoring rules during
execution
2. The <Monitoring Platform> MUST execute all Active rules.
3. The Monitoring Platform, upon activation of a monitoring rule, MAY check that it can be actually
executed in the current <Monitoring Platform> configuration, i.e., the associated data can be gathered
and the corresponding computations/filtering/compositions can be executed.

5.3.4. Use case specification for the Add/Remove Observer use case

Use case name Add/Remove Observer
Use case ID UC-MC.wp6.Monitoring.Add/Remove Observer.-V01

Use case
description
Description
MODAClouds

th
2013 87

Use case
diagram
See the Add/Remove Observer use case of the system boundary model in Section 5.3.1.2.
Goal An observer is any software component that needs to receive information from the
monitoring platform. The objective of Add Observer is to allow new components to
subscribe to the monitoring platform. Upon subscription they will start receiving a specific
stream of data. Such a stream is specified as part of the New Observer operation in terms of
an RDF query. The Remove Observer operation simply detaches an observer from a stream.
<Cloud app admin>
<Self-adaptation platform>

Main success
scenarios
Add Observer scenario
1. The <MODAClouds IDE>, the <Cloud app admin> or the <Self-adaptation
platform> (generically colled Observer) requests the Add Observer operation by
passing an RDF query and the reference of itself as parameter
2. The <Monitoring Platform> checks if it can fulfil the specified RDF query
3. If yes, it adds the observer to its list and returns a reference to the proper stream.

Delete Observer scenario
1. The Observer requests the Delete Observer operation by passing the reference
of itself as parameter.
2. The <Monitoring Platform> checks if the observer is in the list.
3. If yes, then it removes the observer from the list.
Preconditions The <Monitoring Platform> is up and running

Postconditions The list of observers remains in a consistent state

5.3.5. Use case specification for the Collect Monitoring Data use case

Use case name Collect Monitoring Data
Use case ID UC-MC.wp6.Monitoring.Collect Monitoring Data.-V01

Use case
description
Description
Use case diagram See the Collect Monitoring Data use case of the system boundary model in Section 5.3.1.2.
Goal The successful collection of required metrics from both application level and cloud level
(PaaS and IaaS containers), based on the monitoring rules in the Active state.
Main Actors <Cloud app> (here the Cloud app generically represents any monitorable
resource. It may include also the Application cloud if this provides proper
monitoring mechanisms.

MODAClouds

th
2013 88
Main success
scenarios
Collect monitoring data in pull mode
1. Periodically the <Monitoring Platform> checks if the assigned monitoring cost
constraints is still positive
2. If not, then the <Monitoring Platform> closes the connection with the
Application cloud or Cloud all
3. If yes, it queries the Application cloud or <Cloud app> in order to receive
monitoring information.
4. If the query is well formed and the Application cloud or <Cloud app> interface
is running, the Application cloud or <Cloud app> provides the required data.
5. The <Monitoring Platform> executes the Active monitoring rules on the
collected information
6. Then it gives the control to the Distribute Data use case

Collect monitoring data in push mode
7. The <Monitoring Platform> periodically receives data from the Application
cloud or the <Cloud app>
8. The <Monitoring Platform> executes the Active monitoring rules on the
collected information
9. Then it gives the control to the Distribute Data use case
10. Periodically the <Monitoring Platform> checks if the assigned monitoring cost
constraints are still positive
11. If it is negative, then the <Monitoring Platform> closes the connection with the
Application cloud or Cloud all

Exceptions

In the pull mode if data do not arrive within the expected (configurable) time frame, the
<Monitoring Platform> raises an alarm to the <Cloud app> administrator
Preconditions At least a monitoring rule is active in the monitoring platform.
The Application cloud and <Cloud app> components that are able to provide the
required data are known to the <Monitoring Platform> and a connection with them has
been already established.
Postconditions Data are acquired and passed to the Distribute Data use case.

Other requirements:
1. The <Monitoring Platform> MUST acquire at runtime QoS metrics (the performance, availability,
and health metrics specified in deliverable D6.2) from the <Cloud app> and, if exposed by the
cloud provider, from its Application cloud (either, IaaS or PaaS).
2. The <Monitoring Platform> MUST acquire historical and current information about the resource
usage costs incurred to run the CloudApp.
3. For cloud platforms offering resources at spot prices (e.g., EC2 spot instances), the <Monitoring
Platform> MAY be able to acquire spot prices also relatively to a custom time horizon.
4. The <Monitoring Platform> MAY rely on existing standard monitoring APIs (e.g., JMX), tools
(e.g., SIGAR, sar), and cloud provider monitoring APIs.
5. Each monitoring cost constraint MUST be configured within the <Monitoring Platform> probe by
the Execution Platform at deployment time of the CloudApp.
6. The monitoring cost constraint value MAY be updated at runtime by the Self-Adaptation Platform
for cost or overhead management purposes. If a metric can be acquired at no cost, then its cost
constraint will be infinite.
7. The <Monitoring Platform> MUST offer the ability to activate and deactivate the acquisition of
certain information at application runtime.
8. The <Monitoring Platform> MAY offer the ability to adjust the sampling rate at which data is
acquired from the Application cloud or Cloud app. This adjustment is requested by the Self-
Adaptation Platform.

MODAClouds

th
2013 89

5.3.6. Use case specification for the Distribute Data use case
Use case name Distribute Data
Use case ID UC-MC.wp6.Monitoring.Distribute Data.-V01

Use case
description
Description
Use case diagram See the Distribute Data use case of the system boundary model in Section 5.3.1.2.
Goal The successful distribution of information to the observers connected to the monitoring
platform.
<Cloud app admin>
<Self-adaptation platform>

Main success
scenarios
1. The <Monitoring Platform> executes the queries defined by the observers on the
data collected through the Collect Data use case
2. The <Monitoring Platform> sends to the observers all data that match the queries
associated to them (these data are sent through a stream)
Preconditions The <Monitoring Platform> is acquiring data through the Collect Data use case

5.4. Analysis of Requirements
5.4.1.1. Context
Category name Analysis

The scope of the following use case specification is to define analysis and measurement functionalities of the
Execution Platform. These functionalities have the goal of receiving monitoring data and extract aggregate
metrics and knowledge from it.
MODAClouds

th
2013 90

Figure 5.4.a: Analysis Requirements

5.4.1.3. Use case specification for the Detect Violation use case
Use case name Detect Violation
Use case ID UC-MC.wp6.Analysis.Detect Violation.-V01

Use case diagram See the Detect Violation use case of the system boundary model in Section 5.4.1.2.
Goal Detect a violation in a QoS constraint a measured metric and raise a trigger to the
<Self-Adaptation Platform>.
Main Actors <Self-Adaptation Platform>

Main success scenarios
1. A Monitoring rule of the <Monitoring Platform> detects a violation
in the value of one or more QoS metrics.
2. <Monitoring Platform> automatically raises a trigger to all
registered observers.
Preconditions
1. One or more monitoring rules are installed and active on the
<Monitoring Platform>
2. There exist one or more registered observers to the triggers.
Postconditions
1. Triggers are raised in presents of QoS violations

Other requirements:
MODAClouds

th
2013 91

1. Detection rules SHOULD be specified as part of the monitoring queries installed in the <Monitoring
Platform>. These CAN be either SLA requirements or soft QoS constraints that are requested by the
developer.
5.4.2. Use case specification for the Correlate Monitoring Data use case

Use case name Correlate Monitoring Data
Use case ID UC-MC.wp6.Analysis.Correlate Monitoring Data.-V01

Use case diagram See the Correlate Monitoring Data use case of the system boundary model in
Section 5.4.1.2.
Goal The goal is to establish a relationship between measurements collected on different
components of the application, with the aim of generating measures that summarize
component runtime execution correlations.
Main Actors
<Self-Adaptation Platform>

Correlation in deterministic mode
1. <Monitoring Platform> collects monitoring data from a set of streams
2. Based on the timestamps, <Monitoring Platform> outputs on a new
stream a measure that pair events from different sources as being
related to each other

Correlation in statistical mode (black box)
2. Within a time window, <Monitoring Platform> runs a statistical
correlation algorithm
3. <Monitoring Platform> outputs on a new stream a measure that
describes the statistical similarities between metrics coming on the
different streams

Correlation in statistical mode (white box)
2. For each monitoring metric to be correlated, <Monitoring Platform>
determines the associated streams and checks that the metric is
supported for correlation, returning an error if not
3. <Monitoring Platform> periodically obtains a description of the current
application topology from <Execution Platform>
4. Within a time window, <Monitoring Platform> runs a statistical
correlation algorithm, that exploits the application model available to
the <Monitoring Platform> (precondition), to find statistical similarities
MODAClouds

th
2013 92
between metrics coming on the different streams
5. <Monitoring Platform> outputs on a new stream a measure that
describes the statistical similarities between metrics coming on the
different streams
Preconditions
<Monitoring Platform> exposes a set of high-level measures
that it can provide to any observer
One or more observers are registered to receive data from these
measures, most likely components of the <Self-Adaptation Platform>.
<Execution Platform> maintains time synchronization for
monitoring data collected from different sources
<MODAClouds IDE> has provided to <Monitoring Platform>
information on the dependancies between the application components
<Self-Adaptation Platform> has provided to <Monitoring
Platform> information on the current topology of the application
<Self-Adaptation Platform> has registered as observer on the
ouput streams
Postconditions
Correlation measures provided in output by
<MonitoringPlatform> on one or more streams

Other requirements:
1. Correlation in deterministic mode MAY be offered via a standard data stream aggregator solution
programmed to account for the specificity of the MODAClouds Execution Platform.
2. The Monitoring Platform SHOULD be capable of correlating events only based on information
independent of the specific target cloud being considered

5.4.3. Use case specification for the Estimate Measure use case

Use case name Estimate Measure
Use case ID UC-MC.wp6.Analysis.Estimate Measure.-V01

Use case diagram See the Estimate Measure use case of the system boundary model in Section 5.4.1.2.
Goal
Estimate QoS metrics of the system within a time horizon specified by the
<MODAClouds IDE> that are not directly observable by the data collectors, or that
cannot be observed due to overhead concerns.
Main Actors
<MODAClouds IDE>

Estimation in blackbox mode
MODAClouds

th
2013 93

1. < Monitoring Platform> parses the estimation requirements
specification provided by <MODAClouds IDE>
2. For each metric to be estimated, <Monitoring Platform> determines the
streams needed to estimate that metrics and returns an error if one or
more are unavailable
3. < Monitoring Platform> continuously run estimation algorithms to
estimate the value of the metrics that cannot be directly observed by the
monitoring probes
4. The results of the estimation algorithms are put in output on streams
consumed by observers of the <Self-Adaptation Platform> and by
<MODAClouds IDE> via the feedback loop
Estimation in white-box mode
6. For each metric to be estimated, <Monitoring Platform> determines the
streams needed to estimate that metrics and returns an error if one or
more are unavailable
8. < Monitoring Platform> continuously run estimation algorithms to
estimate the value of the metrics that cannot be directly observed by the
monitoring probes
The results of the estimation algorithms are put in output on streams consumed
by observers of the <Self-Adaptation Platform> and by <MODAClouds IDE>
via the feedback loop
Preconditions
<Monitoring Platform> exposes a set of high-level measures that it can
provide to any observer
One or more observers of the <Self-Adaptation Platform> are registered
to receive data from these measures.
<MODAClouds IDE> has provided to <Monitoring Platform>
indications of which metrics should be estimated
Postconditions
Estimated measures provided in output by
<MonitoringPlatform> on one or more streams

Other requirements:
1. For a given application, the Monitoring Platform MUST be able to estimate, if requested by the
monitoring queries and the monitoring data is available, at least mean value of traffic arrival rates,
service demand, number of active users, throughputs, failure events, startup times/uptimes/downtimes.
2. For some the same performance indicators, the Monitoring Platform MAY be able to provide an
estimate of a variance and percentiles over a reference time window .
3. If requested by the monitoring queries, the Monitoring Platform MUST be able to differentiate the
estimation across workload classes and different resources.
MODAClouds

th
2013 94
4. The estimation CAN depend on the runtime models, when this dependence does not introduce a circular
dependence that cannot be resolved.
5. The estimation COULD return confidence information on the estimates.
The estimation MUST support timeouts for the algorithms and MUST cope with abnormal termination and
infeasibilities in the solutions without cascading errors in the dependent systems.

5.4.4. Use case specification for the Forecast Measure use case
5.4.4.1. Use case description

Use case name Forecast Measure
Use case ID UC-MC.wp6.Analysis.Forecast Measure.-V01

Use case
description
Description
Use case
diagram
See the Forecast Measure use case of the system boundary model in Section 5.4.1.2.
Goal These services will forecast, using statistical methods, some of the metrics needed by the
<Self-Adaptation Platform> to manage the application QoS.
Main Actors 1. <MODAClouds IDE>
2. <Self-Adaptation Platform>

Estimation in blackbox mode
2. For each monitoring metric to be forecasted, <Monitoring Platform>
supported for forecasting, returning an error if not
3. < Monitoring Platform> continuously run forecasting algorithms to
predict the value of the metrics on the input streams
4. The results of the blackbox forecasting algorithms are put in output on
streams consumed by observers of the <Self-Adaptation Platform>
Estimation in white-box mode
1. < Monitoring Platform> parses the specification provided by
<MODAClouds IDE>
2. For each monitoring metric to be forecasted, <Monitoring Platform>
supported for forecasting, returning an error if not
4. < Monitoring Platform> continuously run whitebox forecasting
algorithms to predict the value of the metrics on the input streams based
MODAClouds

th
2013 95

on the Application Models and the topology information
5. The results of the whitebox forecasting algorithms are put in output on
streams consumed by observers of the <Self-Adaptation Platform>

Preconditions
<Monitoring Platform> exposes a set of high-level measures
that it can provide to any observer
One or more observers are registered to receive data from these
measures, normally from <Self-Adaptation Platform>
Postconditions
1. Forecasted measures are given in output on one or more output
streams

Other requirements:
1. The Monitoring Platform MUST able to carry out forecasting at predefined times or periodically at a
given period included in the <MODACLouds IDE> specification.
2. The forecasting MAY depend on the application models, when this dependence does not introduce a
circular dependence that cannot be resolved.
3. The forecasting MUST support timeouts to provide forecasts and MUST cope with abnormal
termination and infeasibilities in the predictions without generating errors in the dependent systems.
5.4.5. Use case specification for the Feedback Measure use case
Use case name Feedback Measure
Use case ID UC-MC.wp6.Analysis.Feedback Measure.-V01
Priority of
accomplishment
Must Have
Use case diagram See the Feedback Measure use case of the system boundary model in Section
5.4.1.2.
Goal Return a measure to <MODAClouds IDE> to support the design-runtime
feedback loop.
Main Actors 1. <MODAClouds IDE>
1. <MODAClouds IDE> requests to <Monitoring Platform> to provide
feedback on a set of raw metrics or high-level measures.
2. <Monitoring Platform> creates feedback streams for the data to be
pushed to <MODAClouds IDE>.
3. <Monitoring Platform>, following the input specification provided by
<MODAClouds IDE>, binds either raw metrics streams or measures to
the feedback streams.
4. <Monitoring Platform> deactivates the feedback streams upon request of
<MODAClouds IDE> or when <Cloud app> is not running.
Preconditions
1. <Monitoring Platform> running
2. <Cloud app> deployed, not necessarily running
MODAClouds

th
2013 96
Postconditions
1. Feedback streams push raw metrics and measurements to
<MODAClouds IDE>

5.5. Self-Adaptivity Requirements
5.5.1.1. Context
Category name Self-Adaptivity

The scope of the following use case specifications is to define the Self-Adaptation management services of the
MODAClouds runtime environment.


Figure 5.5.a: Self-Adaptivity Requirements

5.5.2. Use case specification for the Define/Undefine QoS Constraints use case
Use case name Define/Undefine QoS Constraints
Use case ID UC-MC.wp6.Self-Adaptivity.Define/Undefine QoS Constraints.-V01

Use case diagram See the Define/Undefine QoS Constraints use case of the system boundary model in
MODAClouds

th
2013 97

Section 5.5.1.2.
Goal These services will allow to define or undefine in the <Self-Adaptation Platform> a
set of QoS constraints for <Cloud app> specified by the <QoS engineer> in the
<MODACloudsIDE>.
Main Actors
<MODAClouds IDE>
<Cloud app admin>

Main success scenarios 1. <MODACLoudsIDE> requests to < Self-Adaptation Platform> to
define/undefine a set of QoS Constraints for <Cloud app>.
2. <Self-Adaptation Platform> stores the information, adds a log entry for
the operation, checks correctness of the information received.
3. <Self-Adaptation Platform> returns a success or failure code to
<MODACloudsIDE>.
Preconditions 1. <Cloud app> is deployed on <Application Cloud>
2. <Self-Adaptation Platform> is deployed and running on <Service
Cloud>
Postconditions 1. <Self-Adaptation Platform> updated its internal information to
define/undefine the QoS Constraints.
Other requirements:
1. The correctness of the QoS Constraints specification SHOULD be also checked by <MODAClouds
IDE>.
2. QoS Constraints MUST be specified in parsable inter-change format, e.g., a SLA specification in XML.
3. Define QoS contraints SHOULD be automatically invoked by <Execution Platform> when running the
<Deploy Application> use case.
5.5.3. Use case specification for the Start/Stop Feedback of Self-Adaptivity Data use case
Use case name Start/Stop Feedback of Self-Adaptivity Data
Use case ID UC.MC.wp6.Self-Adaptivity.Start/Stop Feedback of Self-Adaptivity Data-
V01
Priority of accomplishment Could Have

Use case
description
Description
Use case
diagram
See the Start/Stop Feedback Self-Adaptivity Data use case of the system boundary model in
Section 5.5.1.2.
Goal Return detailed data on the actions taken by the <Self-Adaptation Platform> in a reference
time horizon and their outcomes.
<Monitoring Platform>
<Execution Platform>
MODAClouds

th
2013 98

1. <MODACloudsIDE> requests to <Self-Adaptation Platform> to
start/stop feedback of self-adaptivity data in a reference time horizon.
2. <Self-Adaptation Platform> configures the runtime models to
record/stop recording data.
3. <Self-Adaptation Platform> registers/deregisters with <Monitoring
Platform> to start/stop a Data Collector of the self-adaptivity data.
4. <Self-Adaptation Platform> returns to <MODACloudsIDE> the success
or failure of the operation.
Preconditions 1. <Self-Adaptation Platform> deployed and running on <Service Cloud>
2. Runtime models running for <Cloud app>.
Postconditions 1. Data collectors for self-adaptivity data started/stopped.
5.5.4. Use case specification for the Define/Undefine Cost Constraints use case
Use case name Define/Undefine Cost Constraints
Use case ID UC.MC.wp6.Self-Adaptivity.Define/Undefine Cost Constraints-V01

Use case
description
Description
Use case
diagram
See the Define/Undefine ost Constraints use case of the system boundary model in Section
5.5.1.2.
Goal These services will allow defining or undefining in the <Self-Adaptation Platform> a set of
Cost constraints for <Cloud app> or <Monitoring Platform> specified by the <QoS
engineer> in <MODACloudsIDE>.
Main Actors
<MODAClouds IDE>
<Execution Platform>
<Cloud app admin>

Main success scenarios 1. <MODACLoudsIDE> requests to < Self-Adaptation Platform> to
define/undefine a set of QoS Constraints for <Cloud app>.
2. <Self-Adaptation Platform> stores the information, adds a log entry for
the operation, checks correctness of the information received.
3. <Self-Adaptation Platform> returns a success or failure code to
<MODACloudsIDE>.
Preconditions 1. <Cloud app> is deployed on <Application Cloud>
2. <Self-Adaptation Platform> is deployed and running on <Service
Cloud>
Postconditions 1. <Self-Adaptation Platform> updated its internal information to
define/undefine the cost constraints.

Other requirements:
MODAClouds

th
2013 99

1. The correctness of the Cost Constraints specification SHOULD also be checked by <MODAClouds IDE>.
2. Define Cost Constraints SHOULD be automatically invoked by <Execution Platform> when running the
<Deploy Application> use case.
5.6. Roadmap
In this section, we describe the roadmap for year 1 activities. In this first year of the project, the MODAClouds
consortium will focus on realizing the initial prototypes for the <Monitoring Platform> and for the execution
platform. The Self-Adaptation platform and the multi-cloud deployment components will be included in the
workplan for the following years. Interfaces between these components will be specified in year 1.
In terms of target clouds, in Year 1 at least on the Amazon EC2 and the Flexiscale IaaS platforms will be
considered. The IaaS focus will continue in Year 2 when initial support for PaaS will be provided. We envision
at this stage that the focus will shift more towards PaaS in the last year of the project.
In terms of timelines for implementation of the requirements, the following table outline the general roadmap:

# Group Use case scenarios (UC-MC.wp6.*) Priority Year(s)
1 Execution Run Application Must Have 1,2
2 Execution Start/Stop Application Sandbox Should Have 2,3
3 Execution Synchronise Application Data Must Have 2,3
4 Execution Deploy Application Must Have 1,2
5 Monitoring Collect Monitoring Data Must Have 1,2
6 Monitoring Distribute Data Must Have 1
7 Monitoring Install Monitoring Rule Must Have 1,2
8 Monitoring Activate/Deactivate Monitoring Rule Must Have 1,2
10 Monitoring Add/Remove Observer Must Have 1,2
11 Analysis Detect Violation Must Have 2
12 Analysis Correlate Monitoring Data Should Have 2
13 Analysis Estimate Measure Must Have 1,2,3
14 Analysis Forecast Measure Must Have 2,3
15 Analysis Feedback Measure Must Have 2
16 Self-Adaptivity Define/Undefine QoS Constraints Must Have 2
17 Self-Adaptivity Start/stop Feedback of Self-Adaptivity Data Could Have 3
18 Self-Adaptivity Define/Undefine Cost Constraints Must Have 2,3

MODAClouds

th
2013 100
References
[Aba05] Abadi, D. J., Ahmad, Y., Balazinska, M. , C !etintemel, U., Cherniack, M., Hwang, J.-H., Lindner, W.,
Maskey, A. S. , Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., and Zdonik, S. The Design of the Borealis Stream
Processing Engine. In Proc. Intl. Conf. on Innovative Data Systems Research (CIDR 2005), 2005.
[Aba05b]: Abadi, D. J., Madden, S., Lindner, W. Reed. Robust, efficient filtering and event detection in sensor
networks. In VLDB, 2005.
[Abh12] Abhishek, V, Kash, I, Key, P. Fixed and market pricing for cloud services. International Conference on
Computer Communications Workshops (INFOCOM WKSHPS). 2012.
[ABS1] AWS Elastic Beanstalk -- Developer Guide -- What Is AWS Elastic Beanstalk and Why Do I Need It?
(accessed in 2013); http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/Welcome.html
[ABS2] AWS Elastic Beanstalk -- Developer Guide -- How Does AWS Elastic Beanstalk Work? (accessed in
2013); http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/concepts.html
[ABS3] AWS Elastic Beanstalk -- Developer Guide -- Components (accessed in 2013);
http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/concepts.components.html
[ABS4] AWS Elastic Beanstalk -- Developer Guide -- Managing and Configuring Applications and
Environments Using the Console, CLI, and APIs (accessed in 2013);
http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/using-features.html
[ABS5] AWS Elastic Beanstalk -- Developer Guide -- Customizing and Configuring AWS Elastic Beanstalk
Environments (accessed in 2013); http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/customize-
containers.html
[ACF1] AWS CloudFormation (accessed in 2013); https://aws.amazon.com/cloudformation
[ACF2] AWS CloudFormation -- User Guide (accessed in 2013);
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide
[ACF3] AWS CloudFormation -- FAQ (accessed in 2013); https://aws.amazon.com/cloudformation/faqs
[ACF4] AWS CloudFormation -- Templates (accessed in 2013); https://aws.amazon.com/cloudformation/aws-
cloudformation-templates
[Aga07] Agarwala, S, Alegre, F, Schwan, K, Mehalingham, J. E2EProf: Automated End-to-End Performance
Management for Enterprise Systems. IEEE/IFIP International Conference on Dependable Systems and Networks
(DSN). 2007.
[Ala09] I. Al-Azzoni and D. Down. Decentralized Load Balancing for Heterogeneous Grids. Proceedings of the
2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns
(COMPUTATIONWORLD '09), 2009.
[Alt06] Altman, E, Boulogne, T, Azouzi, R, Jimnez, T, Wynter, L. A survey on networking games in
telecommunications. Computers & Operation Research. 2006.
[ANA13] Ana Network architecture. Last visited on March 16, 2013. http://www.ana-project.org/web/
[APF2] AppFog Documentation -- Languages (accessed in 2013); https://docs.appfog.com/languages
[APF3] AppFog Documentation -- Services (accessed in 2013); https://docs.appfog.com/services
[APF4] AppFog Documentation -- Feature Roadmap (accessed in 2013); https://docs.appfog.com/roadmap
[APF5] AppFog Documentation -- Tunneling (accessed in 2013); https://docs.appfog.com/services/tunneling
[Ara03] Arasu, A., Babcock, B. , Babu, S., Datar, M., Ito, K., Nishizawa, I., Rosenstein, J., and Widom, J.
STREAM: The Stanford Stream Data Manager (Demonstration Description). In Proc. ACM Intl. Conf. on
Management of data (SIGMOD 2003), page 665, 2003.
[Ara06] Arasu, A., Babu, S., and Widom, J. The CQL Continuous Query Language: Semantic Foundations and
Query Execution. The VLDB Journal, 15(2):121142, 2006.
[Ard08] D. Ardagna, C. Ghezzi, R. Mirandola. Rethinking the use of models in software architecture. QoSA
2008 Proceedings, 1-27, Karlsruhe, Germany, October 2008.
[Ard11] Ardagna, D, Casolari, S, Panicucci, B. Flexible distributed capacity allocation and load redirect
algorithms for cloud systems. IEEE International Conference on Cloud Computing (CLOUD). 2011.
MODAClouds

th
2013 101

[Ard11b] D. Ardagna, S. Casolari, B. Panicucci. Flexible distributed capacity allocation and load redirect
algorithms for cloud systems. Cloud Computing (CLOUD), 2011 IEEE International Conference on, 2011, 163-
170.
[Ard11c] Ardagna, D, Panicucci, B, Passacantando, M. A game theoretic formulation of the service provisioning
problem in cloud systems. International Conference on World Wide Web. 2011.
[Ard12] Ardagna, D, Panicucci, B, Passacantando, M. Generalized nash equilibria for the service provisioning
problem in cloud systems. IEEE Transactions on Services Computing. 2012.
[Ard12b] D. Ardagna, S. Casolari, M. Colajanni, B. Panicucci. Dual Time-scale Distributed Capacity Allocation
and Load Redirect Algorithms for Cloud Systems. Journal of Parallel and Distributed Computing, Elsevier.
72(6), 796-808, 2012.
[Ard12c] Ardagna, D, Panicucci, B, Trubian, M, Zhang, L. Energy-aware autonomic resource allocation in
multi-tier virtualized environments. IEEE Transactions on Services Computing. 2012.
[AUT13] The autonomic Internet. Last visited on March 16, 2013. http://ist-autoi.eu/autoi/index.php
[Bab01] Babu, S., and Widom, J. Continuous Queries over Data Streams. SIGMOD Rec., 30(3):109120, 2001.
[Bai06] Bai, Y., Thakkar, H., Wang, H., Luo, C., and Zaniolo, C. A Data Stream Language and System
Designed for Power and Extensibility. In Proc. Intl. Conf. on Information and Knowledge Management (CIKM
2006), pages 337346, 2006.
[Bal04] Balakrishnan, H., Balazinska, M., Carney, D., C !etintemel, U., Cherniack, M., Convey, C., Galvez, E.,
Salz, J., Stonebraker, M., Tatbul, N., Tibbetts, R. , and Zdonik S. Retro- spective on Aurora. The VLDB Journal,
13(4):370383, 2004.
[Bam99] Bamieh, B, Giarr, L. Identification of linear parameter varying models. IEEE Conference on Decision
Control. 1999.
[Bar04] Barham, P, Donnelly, A, Isaacs, R, Mortier, R. Using Magpie for request extraction and workload
modelling. USENIX Symposium on Opearting Systems Design & Implementation (OSDI). 2004.
[Bar10]: Baresi, L., Caporuscio, M., Ghezzi, C., and Guinea, S. Model-Driven Management of Services.
Proceedings of the Eighth European Conference on Web Services, ECOWS. IEEE Computer Society, 2010, pp.
147-154.
[Bar12]: Baresi, L., Guinea, S. Event-based Multi-level Service Monitoring. 2012.
[Ben04] Bennani, M, Menasce, D. Assessing the robustness of self-managing computer systems under highly
variable workloads. International Conference on Autonomic Computing (ICAC). 2004.
[Ben05] Bennani, M, Menasce, D. Resource allocation for autonomic data centers using analytic performance
models. nternational Conference on Autonomic Computing (ICAC). 2005.
[Bjo12] Bjrkqvist, M, Chen, L, Binder, W. Opportunistic service provisioning in the cloud. International
Conference on Cloud Computing. 2012.
[Bla09]: Blair, G., Bencomo, N. France, R. Models@run.time. Computer, vol. 42, no. 10, pp. 22-27, oct. 2009.
[Bra10]: Brandic, I. FoSII Project: Autonomic Resource Management in Clouds Considering Cloud-based
Resource Monitoring and Knowledge Management. Seoul National University, Seoul, South Korea, July 15th
2010.
[Cal12] Calcavecchia, N, Caprarescu, B, Nitto, E, Dubois, D, Petcu, D. Depas: A decentralized probabilistic
algorithm for auto-scaling. Computing. 2012.
[Cap10] Bogdan Alexandru Caprarescu, Nicolo Maria Calcavecchia, Elisabetta Di Nitto, and Daniel J. Dubois.
Sos cloud: Self-organizing services in the cloud. In BIONETICS, pages 48-55, 2010..
[Car01]: Carzaniga, A., Rosenblum, D. S., Wolf, A. L. Design and Evaluation of a Wide-Area Event
Notification Service. ACM Transactions on Computer Systems, vol. 19, no. 3, pp. 332-383, August, 2001.
[Cas08a] Casale, G, Cremonesi, P, Turrin, R. Robust workload estimation in queueing network performance
models. Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP). 2008.
[Cas08b] Casale, G, Mi, N, Cherkasova, L, Smirni, E. How to parameterize models with bursty workloads.
ACM SIGMETRICS Performance Evaluation Revie. 2008.
MODAClouds

th
2013 102
[CAS13] CASCADAS. Last visited on March 16, 2013. http://acetoolkit.sourceforge.net/cascadas/index.php
[CDF1] Cloudify documentation -- Anatomy of a recipe (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/developing/recipes_overview
[CDF2] Cloudify documentation -- Scaling rules (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/developing/scaling_rules
[CDF3] Cloudify documentation -- Boostrapping any cloud (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/bootstrapping/bootstrapping_cloud
[CDF4] Cloudify documentation -- Application recipe (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/developing/application_recipe
[CDF5] Cloudify documentation -- Service recipe (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/developing/service_recipe
[CDF6] Cloudify documentation -- Configuring security (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/setup/configuring_security
[CDF7] Cloudify documentation -- Attributes API (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/developing/attributes_api
[CDF8] Cloudify documentation -- Custom commands (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/developing/custom_commands
[CDF9] Cloudify documentation -- Probes (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/developing/plugins
[CDF10] Cloudify documentation -- Cloud driver (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/clouddrivers/cloud_driver
[CFY1] Cloud Foundry -- FAQ (accessed in 2013); http://docs.cloudfoundry.com/faq.html#limits
[CFY2] Cloud Foundry -- Services (accessed in 2013); http://docs.cloudfoundry.com/services.html
[CFY3] Cloud Foundry -- Frameworks (accessed in 2013); http://docs.cloudfoundry.com/frameworks.html
[CFY4] Micro Cloud Foundry (accessed in 2013); https://micro.cloudfoundry.com/
[Cha82] Chandy, K, Neuse, D. Linearizer: A heuristic algorithm for queuing network models of computing
systems. Communications of the ACM. 1982.
[Cha12] Rong Chang, editor. 2012 IEEE Fifth International Conference on Cloud Computing, Honolulu, HI,
USA, June 24-29, 2012. IEEE, 2012.
[Che00a] L. Cherkasova, M. DeSouza and S. Ponnekanti. Performance Analysis of Content-Aware'' Load
Balancing Strategy FLEX: Two Case Studies'. In Proceedings of Thirty-Fourth Hawaii International Conference
on System Sciences (HICSS-34), Software Technology Track, January 3-6, 2001.
[Che00b] Chen, J., DeWitt, D. J., Tian, F., and Wang,Y. NiagaraCQ: A Scalable Continuous Query System for
Internet Databases. In W. Chen, J. F. Naughton, and P. A. Bernstein, editors, Proc. ACM Intl. Conf. on
Management of Data (SIGMOD 2000), pages 379390, 2000.
[Che06] cheng Tu,Y., Liu, S., Prabhakar, S., and Yao, B., Load Shedding in Stream Databases: A Control-based
Approach. In Proc. Intl. Conf. on Very Large Data Bases (VLDB 2006), pages 787798, 2006.
[Che08] Chen, Y, Iyer, S, Liu, X, Milojicic, D, Sahai, A. Translating Service Level Objectives to lower level
policies for multi-tier services. Cluster Computing. 2008.
[Coh04] Cohen, I, Goldszmidt, M, Kelly, T, Symons, J, Chase, J. Correlating instrumentation data to system
states: A building block for automated diagnosis and control. USENIX Symposium on Operating Systems
Design & Implementation (OSDI). 2004.
[Coh05] Cohen, I, Zhang, S, Goldszmidt, M, Symons, J, Kelly, T, Fox, A. Capturing, indexing, clustering, and
retrieving system history. ACM symposium on Operating systems principles (SOSP). 2005.
[Cor05]: Cormode, G., Garofalakis, M. N. Sketching streams through the net: Distributed approximate query
tracking. In VLDB, 2005, pp. 13-24.
[Cre02] Cremonesi, P, Schweitzer, P, Serazzi, G. A unifying framework for the approximate solution of closed
multiclass queuing networks. IEEE Transactions on Computers. 2002.
MODAClouds

th
2013 103

[Cre10] Cremonesi, P, Dhyani, K, Sansottera, A. Service time estimation with a refinement enhanced hybrid
clustering algorithm. International Conference on Analytical and Stochastic Modeling Techniques and
Applications (ASMTA). 2010.
[Cre12] Cremonesi, P, Sansottera, A. Indirect Estimation of service demands in the presence of structural
changes. International Coference on Quantitative Evaluation of Systems (QEST). 2012.
[Cza98]: Czajkowski, G., Eicken, T. V. JRes: A Resource Accounting Interface for Java. Proceedings of the
13th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, 1998.
[Dea12] Dean, D, Nguyen, H, Gu, X. UBL: Unsupervised behavior learning for predicting performance
anomalies in virtualized cloud systems. International Conference on Autonomic Computing (ICAC). 2012.
[Des12] Desnoyers, P, Wood, T, Shenoy, P, Patil, S, Vin, H. Modellus: Automated modeling of complex data
center applications. ACM Transactions on Internet Technology. 2012.
[Di12] Di, S, Kondo, D, Cirne, W. Host load prediction in a Google compute cloud with a Bayesian model.
International Conference on High Performance Computing, Networking, Storage and Analysis (SC). 2012.
[Dou12] Brian Dougherty, Jules White, and Douglas C. Schmidt. Model-driven auto-scaling of green cloud
computing infrastructure. Future Generation Comp. Syst., 28(2):371 378, 2012.
[Dua09] Duan, S, Babu, S, Munagala, K. Fa: A system for automating failure diagnosis. IEEE International
Conference on Data Engineering (ICDE). 2009.
[Dub07] Parijat Dube, Zhen Liu, Laura Wynter, and Cathy H. Xia. Competitive equilibrium in e-commerce:
Pricing and outsourcing. Computers & OR, 34(12):35413559, 2007.
[Dut10] Dutreilh, X, Rivierre, N, Moreau, A, Malenfant, J, Truck, I. From data center resource allocation to
control theory and back. International Conference on Cloud Computing. 2010.
[Dut12] Sourav Dutta, Sankalp Gera, Akshat Verma, and Balaji Viswanathan. Smartscale: Automatic
application scaling in enterprise clouds. In Chang [Cha12], pages 221228.
[Eme12]: Emeakaroha, V. C., Ferreto, T. C., Netto, M. A. S., Brandic, I., Rose, De C. A. F. CASViD:
Application Level Monitoring for SLA Violation Detection in Clouds. 2012.
[Fen11] Yuan Feng, Baochun Li, and Bo Li. Price competition in an oligopoly cloud market. 2011.
[Fer13] Fernandez,R.C., Migliavacca,M., Kalyvianaki,E., and Pietzuch,P. Integrating Scale Out and Fault
Tolerance in Stream Processing using Operator State Management. In Sigmod, 2013. TO APPEAR
[GAE1] GAE Documentation --- The Java Servlet Environment (accessed in 2013);
https://developers.google.com/appengine/docs/java/runtime
[GAE2] GAE Documentation --- Quotas (accessed in 2013);
https://developers.google.com/appengine/docs/quotas
[GAE3] GAE Documentation --- Backends and Java API Overview (accessed in 2013);
https://developers.google.com/appengine/docs/java/backends/overview
[GAE4] GAE Documentation --- Datastore Overview (accessed in 2013);
https://developers.google.com/appengine/docs/java/datastore/overview
[GAE5] GAE Documentation --- Java Service APIs (accessed in 2013);
https://developers.google.com/appengine/docs/java/apis
[Gam12] Gambi, A, Toffetti, G. Modeling cloud performance with Kriging. International Conference on
Software Engineering (ICSE). 2012.
[Gia11] Giani, P, Tanelli, M, Lovera, M. Controller design and closed-loop stability analysis for admission
control in Web service systems. World Congress. 2011.
[Gma07] Gmach, D, Rolia, J, Cherkasova, L, Kemper. A. Workload analysis and demand prediction of
enterprise data center applications. International Symposium on Workload Characterization (IISWC). 2007.
[Gou11] Hadi Goudarzi and Massoud Pedram. Multi-dimensional sla-based resource allocation for multi-tier
cloud computing systems. In Liu and Parashar [Liu11], pages 324331.
MODAClouds

th
2013 104
[Gol03] Golab, L., DeHaan, D., Demaine, E. D., Lopez-Ortiz, A., and Munro, J. I. Identifying Frequent Items in
Sliding Windows over On-line Packet Streams. In Proc. Intl. Conf. on Internet Measurement (IMC 2003), pages
173178, 2003.
[Gol08] Golab,L., Johnson,T., Koudas, N. , Srivastava, D., and Toman. D., Optimizing Away Joins on Data
Streams. In Proc. Intl. Workshop on Scalable Stream Processing System (SSPS 2008), pages 4857, 2008.
[Gol09]: Goldsack, P., Guijarro, J., Loughran, S., et al. The SmartFrog configuration management framework.
ACM SIGOPS Oper. Syst. Rev., 2009, 43, pp. 16-25.
[Gon11]: Gonzalez, J., Munoz, A., Mana, A. Multi-layer Monitoring for Cloud Computing. IEEE 13th
International Symposium on High-Assurance Systems Engineering 2011.
[GPA13] General purpose autonomic computing. Last visited on March 16, 2013.- http://www-
users.aston.ac.uk/~calinerc/gpac.html
[GT13] G. Trends, "Results for "cloud computing", DOI: http://www.google.com/trends?q=cloud+computing,"
2013.
[Gul12] Gulisano, R. Jimenez-Peris, et al. StreamCloud: An Elastic and Scalable Data Streaming System.
TPDS, 99(PP), 2012.
[Had12] Makhlouf Hadji and Djamal Zeghlache. Minimum cost maximum flow algorithm for dynamic resource
allocation in clouds. In Chang [Cha12], pages 876882.
[Has11] Hassan, M, Song, B, Huh, E. Distributed resource allocation games in horizontal dynamic cloud
federation platform. International Conference on High Performance Computing and Communications (HPCC).
2011.
[Has12] Hassan, M, Hossain, M, Sarkar, A, Huh, E. Cooperative game-based distributed resource allocation in
horizontal dynamic cloud federation platform. Information Systems Frontiers. 2012.
[He12] Ting He, Shiyao Chen, Hyoil Kim, Lang Tong, and Kang-Won Lee. Scheduling parallel tasks onto
opportunistically available cloud resources. In Chang [Cha12], pages 180187.
[HER1] Heroku DevCenter -- The Process Model (accessed in 2013);
https://devcenter.heroku.com/articles/process-model
[HER2] Heroku DevCenter -- Dynos and the Dyno Manifold (accessed in 2013);
https://devcenter.heroku.com/articles/dynos
[HER3] Heroku DevCenter -- Languages (accessed in 2013); https://devcenter.heroku.com/categories/language-
support
[HER4] Heroku DevCenter -- Buildpacks (accessed in 2013); https://devcenter.heroku.com/articles/buildpacks
[HER5] Heroku DevCenter -- Scaling Your Process Formation (accessed in 2013);
https://devcenter.heroku.com/articles/scaling
[HER6] Heroku Add-ons (accessed in 2013); https://addons.heroku.com/
[HER7] Heroku DevCenter -- HTTP Routing and the Routing Mesh (accessed in 2013);
https://devcenter.heroku.com/articles/http-routing
[HER8] Heroku DevCenter -- Slug Compiler (accessed in 2013); https://devcenter.heroku.com/articles/slug-
compiler
[HER9] Heroku API (accessed in 2013); https://api-docs.heroku.com/
[HER10] Heroku DevCenter -- Frequently Asked Questions about Java (accessed in 2013);
https://devcenter.heroku.com/articles/java-faq
[HER11] The Twelve-Factor App (accessed in 2013); http://12factor.net/
[Hol09] Holub, V., Parsons, T., O'Sullivan, P., Murphy, J. Runtime correlation engine for system monitoring and
testing. In ICAC-INDST '09 Proceedings of the 6th international conference industry session on Autonomic
computing and communications industry session, pages 9-18, New York, NY, USA, 2009. ACM.
[Hol10] Holze, M, Haschimi, A, Ritter, N. Towards workload-aware self-management: Predicting significant
workload shifts. International Conference on Data Engineering Workshops (ICDEW). 2010.
[Hol10b]: Holub, V., Parsons, T., O'Sullivan, P. Run-Time Correlation Engine for System Monitoring and
Testing (RTCE). 2010.
MODAClouds

th
2013 105

[Hue05]: Huebsch, R., Chun, B. N., Hellerstein, J. M., Loo, B. T., Maniatis, P., Roscoe, T., Shenker, S., Stoica,
I., Yumerefendi, A. R. The architecture of pier: an internet-scale query processor. In CIDR, 2005.
[Jag95] Jagadish, H. V., Mumick, I. S., and Silberschatz, A. View Maintenance Issues for the Chronicle Data
Model. In Proc. ACM Symp. on Principles of Database Systems (PODS 1995), pages 113124, 1995.
[Jer97]: Jerding, D. F., Stasko, J. T., Ball, T. Visualizing Interactions in Program Executions. Proceedings of the
International Conference on Software Engineering, 1997.
[JUJ1] Frequently Asked Questions (accessed in 2013); https://juju.ubuntu.com/docs/faq.html
[JUJ2] Juju Charms (accessed in 2013); http://jujucharms.com/charms
[JUJ3] Juju Documentation (accessed in 2013); https://juju.ubuntu.com/docs
[JUJ4] Juju Documentation -- Getting started (accessed in 2013); https://juju.ubuntu.com/docs/getting-
started.html
[JUJ5] Juju Documentation -- User tutorial (accessed in 2013); https://juju.ubuntu.com/docs/user-tutorial.html
[JUJ6] Juju Documentation -- Charms (accessed in 2013); https://juju.ubuntu.com/docs/charm.html
[JUJ7] Juju Documentation -- Service configuration (accessed in 2013); https://juju.ubuntu.com/docs/service-
config.html
[JUJ8] Juju Documentation -- Machine constraints (accessed in 2013);
https://juju.ubuntu.com/docs/constraints.html
[JUJ9] Juju Documentation -- Operating systems (accessed in 2013); https://juju.ubuntu.com/docs/operating-
systems.html
[Jun09] Jung, G, Joshi, K, Hiltunen, M, Schlichting, R, Pu, C. A Cost-sensitive adaptation engine for server
consolidation of multitier applications. Middleware. 2009.
[Jun10] Jung, G, Hiltunen, M, Joshi, K, Schlichting, P, Pu, C. Mistral: Dynamically managing power,
performance, and adaptation cost in cloud infrastructures. International Conference on Distributed Computing
Systems (ICDCS). 2010.
[Kal09] Kalyvianaki, E, Charalambous, T, Hand, S. Self-adaptive and self-con*gured CPU resource
provisioning for virtualized servers using Kalman filters. International Conference on Autonomic Computing
(ICAC). 2009
[Kal11] Kalbasi, A, Krishnamurthy, D, Rolia, J, Richter. MODE: mix driven on-line resource demand
estimation. International Conference on Network and Services Management (CNSM). 2011.
[Kal12] Kalbasi, A, Krishnamurthy, D, Rolia, J, Dawson, S. DEC: service demand estimation with confidence.
IEEE Transactions on Software Engineering. 2012.
[Kar11] Kari, C.; Yoo-Ah Kim; Russell, A., "Data Migration in Heterogeneous Storage Systems," Distributed
Computing Systems (ICDCS), 2011 31st International Conference on , vol., no., pp.143,150, 20-24 June 2011.
[Kel79] Kelly, F. Reversibility and Stochastic Networks. Cambridge University Press. 1979.
[Kha12] Khan, A, Yan, X, Tao, S, Anerousis, N. Workload characterization and prediction in the cloud: A
multiple time series approach. IEEE/IFIP International Workshop on Cloud Management (Cloudman). 2012.
[Kir10]: Kirschnick, J., Calero, J. A. M., Wilcock, L., Edwards, N. Towards an architecture for the automated
provisioning of cloud services. IEEE Commun. Mag., 2010, 48, (12), pp. 124-131.
[Kle75] Kleinrock, L. Queueing Systems. Wiley-Interscience. 1975.
[Kon12] Kleopatra Konstanteli, Tommaso Cucinotta, Konstantinos Psychas, and Theodora A. Varvarigou.
Admission control for elastic cloud services. In Chang [Cha12], pages 4148.
[Kon12b]: Knig, B., Calero, J. A. M., Kirschnick, J. Elastic monitoring framework for cloud infrastructures.
Communications, IET, vol. 6, num. 10, pp. 1306-1315, July, 2012.
[Law04] Law, Y.-N., Wang, H., and Zaniolo, C. Query Languages and Data Models for Database Sequences and
Data Streams. In Proc. Intl. Conf. on Very Large Data Bases (VLDB 2004), pages 492503, 2004.
MODAClouds

th
2013 106
[Law05] Law, Y.-N. , and Zaniolo, C., An Adaptive Nearest Neighbor Classification Algorithm for Data
Streams. In Proc. Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD 2005),
pages 108120, 2005.
[Lee99] Lee, L, Poolla, K. Identification of linear parameter-varying systems using nonlinear programming.
Transactions-American Society Of Mechanical Engineers Journal Of Dynamic Systems Measurement And
Control. 1999.
[Lim10] Lim, H, Babu, S, Chase, J. Automated control for elastic storage. International Conference on
Autonomic Computing (ICAC). 2010.
[Lin12] Yi-Kuei Lin and Ping-Chen Chang. Reliability evaluation of a computer network in cloud computing
environment subject to maintenance budget. Applied Mathematics and Computation, 219(8):38933902, 2012.
[Lit05] Litoiu, M, Woodside, C, Zheng, T. Hierarchical model-based autonomic control of software systems.
ACM SIGSOFT Software Engineering Notes. 2005.
[Liu99] Liu, L. , Pu, C., and Tang W. Continual Queries for Internet Scale Event-Driven Information Delivery.
IEEE Trans. Knowl. Data Eng., 11(4):610628, 1999.
[Liu05] Liu, Y, Gorton, I, Fekete, K. Design-level performance prediction of component-based applications.
[Liu06] Liu, Z, Wynter, L, Xia, C, Zhang, F. Parameter inference of queueing models for IT systems using end-
to-end measurements. ACM SIGMETRICS Performance Evaluation Review. 2006.
[Liu10] Liu, T, Methapatara, C, Wynter, L. Revenue management model for on-demand it services. European
Journal of Operational Research. 2010.
[Liu11] Ling Liu and Manish Parashar, editors. IEEE International Conference on Cloud Computing, CLOUD
2011, Washington, DC, USA, 4-9 July, 2011. IEEE, 2011.
[Lov98] Lovera, M, Verhaegen, M, Chou, C. State space identification of MIMO linear parameter varying
models. International Symposium on the Mathematical Theory of Networks and Systems. 1998.
[Lu02] Lu, C., Alvarez, G. A. and Wilkes, J. 2002. Aqueduct: online data migration with performance
guarantees. In Proceedings of the 1st USENIX conference on File and storage technologies (FAST'02). USENIX
Association, Berkeley, CA, USA, 18-18.
[Lu03] Lu, Y, Abdelzaher, T, Lu, C, Sha, L, Liu, X. Feedback control with queueing-theoretic prediction for
relative delay guarantees in web servers. IEEE Real-Time and Embedded Technology and Applications
Symposium. 2003.
[Lu09] Lu, Y, AbouRizk, S. Automated BoxJenkins forecasting modelling. Automation in Construction. 2009.
[Mad02]: Madden, S., Franklin, M. J., Hellerstein, J. M., Hong, W. Tag: A tiny aggregation service for ad-hoc
sensor networks. In OSDI, 2002.
[Mal11] Malkowski, S, Hedwig, M, Li, J, Pu, C, Neumann, D. Automated control for elastic n-tier workloads
based on empirical modeling. International Conference on Autonomic Computing (ICAC). 2011.
[Mar11] Martin,A., Knauth,T., et al. Scalable and Low-Latency Data Processing with Stream MapReduce. In
CLOUDCOM, 2011.
[Mar12] Marek, L, Zheng, Y, Ansaloni, D, Sarimbekov, A, Binder, W, Tuma, P. Java bytecode instrumentation
made easy: The DiSL framework for dynamic program analysis. 2012
[Mas11]: Mastelic, T., Emeakaroha, V. C., Maurer, M., Brandic, I. M4CLOUD - Generic Application Level
Monitoring For Resource-Shared Cloud Environments. 2011.
[Maz12] Michele Mazzucco and Dmytro Dyachuk. Optimizing cloud providers revenues via energy efficient
server allocation. Sustainable Computing: Informatics and Systems, 2(1):1 12, 2012.
[Men94] Capacity Planning and Performance Modeling: from mainframes to client-server systems, D. Menasc,
V. Almeida, and L. Dowdy, Prentice Hall, 1994.
[Men03] Menasce, D, Bennani, M. On the use of performance models to design self-managing computer
systems. Computer Measurement Group Conference. 2003.
[Men05] Menasce, D, Bennani, M, Ruan, H. On the use of online analytic performance models in self-managing
and self-organizing computer systems. Self-star Properties in Complex Information Systems. 2005.
MODAClouds

th
2013 107

[Men07] Menasce, D, Ruan, H, Gomaa, H. QoS management in service-oriented architecture. Performance
evaluation. 2007
[Men08]: Meng, S., Kashyap, S. R., Venkatramani, C., Liu, L. Resource-Aware Application State Monitoring
(REMO). IEEE Transactions On Parallel And Distributed Systems. 2008.
[Men11] Ishai Menache, Asuman Ozdaglar, and Nahum Shimkin. Socially optimal pricing of cloud computing
resources. In Proceedings of the 5th International ICST Conference on Performance Evaluation Methodologies
and Tools, VALUETOOLS 11, pages 322331, ICST, Brussels, Belgium, Belgium, 2011. ICST (Institute for
Computer Sciences, Social-Informatics and Telecommunications Engineering).
[Mey04]: Meyerhfer, M., Neumann, C. TESTEJB - A Measurement Framework for EJBs. Proceedings of the
7th International Symposium on Component-Based Software Engineering (CBSE 2004), Edinburgh, UK, May
24-2, 2004, pp. 294-301.
[Mos02]: Mos, A., Murphy, J. A framework for performance monitoring, modelling and prediction of
component oriented distributed systems. Proceedings of the 3rd international workshop on Software and
performance (WOSP '02), 2002.
[MOS1] Dana Petcu, Ciprian Craciun, Massimiliano Rak: Towards a Cross Platform Cloud API -- Components
for Cloud Federation; 2011
[MOS2] Ciprian Craciun: Building blocks of scalable applications; Masters thesis; 2012;
https://github.com/downloads/cipriancraciun/masters-thesis/thesis.pdf
[MOS3] mOSAIC notes -- Component controller (accessed in 2013);
http://wiki.volution.ro/Mosaic/Notes/Platform
[MOS4] mOSAIC notes -- Component hub (accessed in 2013); http://wiki.volution.ro/Mosaic/Notes/Hub
[MOS5] mOSAIC BitBucket repositories (accessed in 2013); https://bitbucket.org/mosaic
[Mun07] Munagala, K., Srivastava, U., and Widom, J. Optimization of Continuous Queries with Shared
Expensive Filters. In Proc. ACM Intl. Symp. on Principles of Database Systems (PODS 2007), pages 215224,
2007.
[Nash54] J. Nash. Non-cooperative games. The Annals of Mathematics, 54(2):286295, 1951.
[Nee11] Neelakanta, G, Veeravalli, B. On the resource allocation and pricing strategies in compute clouds using
bargaining approaches. International Conference on Networks (ICON). 2011.
[Nem95] Nemani, M, Ravikanth, R, Bamieh, B. Identification of linear parametrically varying systems. IEEE
Conference on Decision Control. 1995.
[Neu10] Neumeyer,L., Robbing,B., et al. S4: Distributed Stream Computing Platform. In ICDMW, 2010.
[Ope07]: OpenSOA, Service Data Objects Specification. http://www.oasis-opencsa.org/sdo, 2007.
[Pac08] Pacifici, G, Segmuller, W, Spreitzer, M, Tantawi, A. CPU demand for web serving: Measurement
analysis and dynamic estimation. ACM SIGMETRICS Performance Evaluation Review. 2008.
[Par06]: Parsons, T., Murphy, J. The 2nd International Middleware Doctoral Symposium: Detecting
Performance Antipatterns in Component-Based Enterprise Systems. IEEE Distributed Systems Online, vol. 7,
no. 3, March, 2006.
[Par07]: Parsons, T. Automatic Detection of Performance Design and Deployment Antipatterns in Component
Based Enterprise Systems. Ph.D. Thesis, 2007, University College Dublin.
[Par08]: Parsons, T., Murphy, J. Detecting Performance Antipatterns in Component Based Enterprise Systems.
Journal of Object Technology, vol. 7, no. 3, 2008.
[Pou10] Poussot-Vassal, C, Tanelli, M, Lovera, M. Linear parametrically varying MPC for combined quality of
service and energy management in web service systems. American Control Conference. 2010.
[Pow05] Powers, R, Goldszmidt, M, Cohen, I. Short term performance forecasting in enterprise
systems.International Conference on Knowledge Discovery and Data Mining (SIGKDD). 2005.
[PUP1] Puppet Labs (accessed in 2013); https://puppetlabs.com/
[PUP2] Puppet Labs -- What is Puppet? (accessed in 2013); https://puppetlabs.com/puppet/what-is-puppet
MODAClouds

th
2013 108
[PUP3] Puppet Labs -- Big Picture (accessed in 2013);
http://projects.puppetlabs.com/projects/puppet/wiki/Big_Picture
[PUP4] Puppet Labs -- What is Puppet? (slides) (accessed in 2013);
http://www.mit.edu/people/marthag/talks/puppet/img2.html
[PUP5] Puppet Labs -- Glossary (accessed in 2013); http://docs.puppetlabs.com/references/glossary.html
[PUP6] Puppet Labs -- Reference Manual (accessed in 2013); http://docs.puppetlabs.com/puppet/2.7/reference
[PUP7] Puppet Labs -- Tools (accessed in 2013); http://docs.puppetlabs.com/guides/tools.html
[PUP8] Puppet Labs -- Exported Resources (accessed in 2013);
http://docs.puppetlabs.com/puppet/2.7/reference/lang_exported.html
[PUP9] Puppet Labs -- Compare Puppet Enterprise (accessed in 2013);
https://puppetlabs.com/puppet/enterprise-vs-open-source
[PUP10] Puppet Labs -- System Requirements (accessed in 2013);
http://docs.puppetlabs.com/puppet/3/reference/system_requirements.html
[Ran06] P. Ranganathan, P. Leech, D. Irwin, and J. Chase. Ensemble-level Power Management for Dense Blade
Servers. SIGARCH Comput. Archit. News, 34, 2006. [Ris02] A. Riska. Aggregate Matrix-Analytic techniques
and their applications. PhD thesis. Computer Science College of William & Mary, Williamsburg, VA.
[Rol95] Rolia, J, Vetland, V. Parameter estimation for performance models of distributed application systems.
Conference of the Centre for Advanced Studies on Collaborative Research (CASCON). 1995.
[Rol98] Rolia, J, Vetland, V. Correlating resource demand information with ARM data for application services.
International Workshop on Software and Performance (WOSP). 1998.
[Rub] RUBiS: Rice University Bidding System. http://rubis.ow2.org.
[Sei87] Seidmann, A, Schweitzer, P, Shalev-Oren, S. Computerized closed queueing network models of flexible
manufacturing systems. Large Scale Systems. 1987.
[Sha08] Sharma, A, Bhagwan, R, Choudhury, M, Golubchik, L, Govindan, R, Voelker, G. Automatic request
categorization in internet services. ACM SIGMETRICS Performance Evaluation Review. 2008.
[Sha10]: Shao, J., Wei, H., Wang, Q., Mei, H. A Runtime Model Based Monitoring Approach for Cloud
(RMCM). 2010 IEEE 3rd International Conference on Cloud Computing.
[Shi06] Shivam, P, Babu, S, Chase, J. Learning application models for utility resource planning. International
Conference on Autonomic Computing (ICAC). 2006.
[Son12] Yang Song, Murtaza Zafer, and Kang-Won Lee. Optimal bidding in spot instance market. In Albert G.
Greenberg and Kazem Sohraby, editors, INFOCOM, pages 190198. IEEE, 2012.
[Spi11] Spinner, S. Evaluating approaches to resource demand estimation (Master Thesis). Karlsruhe Institute of
Technology. 2011.
[Sri05]: Srivastava, U., Munagala, K., Widom, J. Operator placement for in-network stream query processing. In
PODS, 2005, pp. 250-258.
[Sri08] Shekhar Srikantaiah, Aman Kansal, and Feng Zhao. Energy aware consolidation for cloud computing. In
Proceedings of the 2008 conference on Power aware computing and systems, HotPower08, pages 1010,
Berkeley, CA, USA, 2008. USENIX Association.
[Sut08] Sutton, C, Jordan, M. Probabilistic inference in queueing networks. In Workshop on Tackling Computer
Systems Problems with Machine Learning Techniques (SysML). 2008.
[Tan12] Tan, Y, Nguyen, H, Shen, Z, Gu, X, Venkatramani, C, Rajan, D. PREPARE: Predictive performance
anomaly prevention for virtualized cloud systems. International Conference on Distributed Computing Systems
(ICDCS). 2012.
[Tes05] Tesauro, G, Das, R, Walsh, W, Kephart, J. Utility-function driven resource allocation in autonomic
systems. International Conference on Autonomic Computing (ICAC). 2005.
[Tes06] Tesauro, G, Jongt, N, Das, R, Bennanit, M. A hybrid reinforcement learning approach to autonomic
resource allocation. International Conference on Autonomic Computing (ICAC). 2006.
[The08] Thereska, E, Ganger, G. IRONModel: Robust performance models in the wild. ACM SIGMETRICS
Performance Evaluation Review. 2008.
MODAClouds

th
2013 109

[Tia11] Fengguang Tian and Keke Chen. Towards optimal resource provisioning for running mapreduce
programs in public clouds. In Liu and Parashar [Liu11], pages 155162.
[Tpc] Transaction processing performance council. TPC-W. http://www.tpc.org/tpcw.
[Tur07]: Turnbull, J. Pulling Strings with Puppet, FristPress, 2007, 1st edn.
[Twi13] Twitter Storm. github.com/nathanmarz/storm/wiki , 2013
[Urg05] Urgaonkar, B, Pacifici, G, Shenoy, P, Spreitzery, M, Tantawi, A. An analytical model for multitier
internet services and its applications. ACM SIGMETRICS Performance Evaluation Review. 2005.
[Vaq08] Luis M. Vaquero, Luis Rodero-Merino, Juan Caceres, and Maik Lindner. A break in the clouds:
towards a cloud definition. SIGCOMM Comput. Commun. Rev., 39(1):5055, December 2008.
[Val11] Giuseppe Valetto, Paul L. Snyder, Daniel J. Dubois, Elisabetta Di Nitto, and Nicolo Maria
Calcavecchia. A self-organized load-balancing algorithm for overlay-based decentralized service networks. In
SASO, pages 168-177, 2011.
[Ven11]: Venticinque S., Martino, Di B., Pectu, D. Agent-based cloud provisioning and management, design
and prototypal implementation. In M. v. S. Frank Leymann, Ivan Ivanov and B. Shishkov, editors, 1st
International Conference on Cloud Computing and Services Science (CLOSER2011), pages 184-191. ScitePress,
2011.
[Ver02] Verdult, V. Nonlinear system identification: A state-space approach. Ph.D. dissertation. Twente
University Press. 2002.
[Ver07] Vercauteren, T, Aggarwal, P, Wang, X, Li, T. Hierarchical forecasting of web server workload using
sequential monte carlo training. IEEE Transactions on Signal Processing. 2007.
[Wan03] W. Zhang and W. Zhang. Linux Virtual Server Clusters. Linux Magazine, November,2003.
[Wan05] Wang, X, Abraham, A, Smith, K. Intelligent web traffic mining and analysis. Journal of Network and
Computer Applications. 2005.
[Wan12] Jian Wan, Dechuan Deng, and Congfeng Jiang. Non-cooperative gaming and bidding model based
resource allocation in virtual machine environment. In IPDPS Workshops, pages 21832188. IEEE Computer
Society, 2012.
[Wan12b] Lijuan Wang and Jun Shen. Towards bio-inspired cost minimisation for data-intensive service
provision. In Services Economics (SE), 2012 IEEE First International Conference on, pages 16 23, june 2012.
[WAZ1] Windows Azure Documentation -- Introducing Windows Azure (accessed in 2013);
http://www.windowsazure.com/en-us/develop/net/fundamentals/intro-to-windows-azure
[WAZ2] Windows Azure Documentation -- Windows Azure Execution Models (accessed in 2013);
http://www.windowsazure.com/en-us/develop/net/fundamentals/compute
[Wei10] Guiyi Wei, Athanasios V. Vasilakos, Yao Zheng, and Naixue Xiong. A game- theoretic method of fair
resource allocation for cloud computing services. The Journal of Supercomputing, 54(2):252269, 2010.
[Wik13] Wikipedia, Data Migration, http://en.wikipedia.org/wiki/Data_migration 2013.
[Win09] Wingerden, J. Control of wind turbines with smart rotors: Proof of concept & LPV subspace
identification. Ph.D. dissertation. Delft University of Technology. 2009.
[Woo95] Woodside, C, Neilson, J, Petriu, D, Majumdar, S. The stochastic rendezvous network model for
performance of synchronous Client-Server-like distributed software. IEEE Transactions on Computers. 1995.
[Woo06] Woodside, C, Zheng, T, Litoiu, M. Service system resource management based on a tracked layered
performance model. International Conference on Autonomic Computing (ICAC). 2006.
[Wu08] Wu, X, Woodside, M. A calibration framework for capturing and calibrating software performance
models. European Performance Engineering Workshop on Computer Performance Engineering (EPEW). 2008.
[Wu10] Wu, Y, Hwang, K, Yuan, Y, Zheng, W. Adaptive workload prediction of grid performance in
confidence windows. IEEE Transaction on Parallel and Distributed Systems. 2010.
[Wu12] Linlin Wu, Saurabh Kumar Garg, and Rajkumar Buyya. Sla-based admission control for a software-as-
a-service provider in cloud computing environments. J. Comput. Syst. Sci., 78(5):12801299, 2012.
MODAClouds

th
2013 110
[Xia12] Z. Xiao, Q. Chen, and H. Luo. Automatic scaling of internet applications for cloud computing services.
Computers, IEEE Transactions on, PP(99):1, 2012.
[Xio11] PengCheng Xiong, Zhikui Wang, Simon Malkowski, Qingyang Wang, Deepal Jayasinghe, and Calton
Pu. Economical and robust provisioning of n-tier cloud workloads: A multi-level control approach. In ICDCS,
pages 571580. IEEE Computer Society, 2011.
[Xio13] Xiong, P, Pu, C, Zhu, X, Griffith, R. vPerfGuard: an automated model-driven framework for application
performance diagnosis in consolidated cloud environments. International Conference on Performance
Engineering (ICPE), 2013.
[Xu07] Xu, J, Zhao, M, Fortes, J, Carpenter, R, Yousif, M. On the use of fuzzy modeling in virtualized data
center management. International Conference on Autonomic Computing (ICAC). 2007.
[Yal04]: Yalagandula P., Dahlin, M.. A scalable distributed information management system. In SIGCOMM,
2004, pp. 379-390.
[Zaf12] Murtaza Zafer, Yang Song, and Kang-Won Lee. Optimal bids for spot vms in a cloud for deadline
constrained jobs. In Chang [Cha12], pages 7582
[Zam12] Sharrukh Zaman and Daniel Grosu. An online mechanism for dynamic vm provisioning and allocation
in clouds. In Chang [Cha12], pages 253260.
[Zha03] Zhang, G. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing.
2003.
[Zha07] Zhang, Q, Cherkasova, L, Smirni, E. A regression-based analytic model for dynamic resource
provisioning of multi-tier applications. International Conference on Autonomic Computing (ICAC). 2007.
[Zhe05] Zheng, T, Yang, J, Woodside, M, Litoiu, M, Iszlai, M. Tracking time-varying parameters in software
systems with extended Kalman filters. Conference of the Centre for Advanced Studies on Collaborative
Research (CASCON). 2005.
[Zhe08] Zheng, T, Woodside, M, Litoiu, M. Performance model estimation and tracking using optimal filters.

MODAClouds

th
2013 111

Appendix A Run-time platform evaluation criteria
Although many of the surveyed, or other existing, solutions are production-ready --- or even better backed up by
powerful companies in the IT sector --- and offer many features, we must focus our effort in determining if they
are a good match with ModaClouds requirements, described in a later section. Such a goal implies two separate
conditions:
first of all they should be suitable for our industrial partners' case study applications; this in turn implies
matching the supported programming languages, palette of available resources and middleware, and
nonetheless security requirements;
and in order to fulfill our project's goal, they must provide a certain flexibility, to allow our run-time
environment to integrate, and provide enhanced services and support for the user's application;
Therefore, we are especially interested in the following aspects:
type
Actually one of the categories mentioned in the beginning of the section, which broadly describes what is the
purpose of the solution and the range of features it offers.
PaaS --- fully integrated solution that abstracts away all low-level details of the deployment and
execution;
application execution --- suitable only for application execution, meaning that it does not manage the
host environment it runs in, like operating system, machine, etc.; (classical examples would be Tomcat
and derivatives, Ruby on Rails, etc.;)
application deployment --- as above suitable only for application deployment, thus implying that the
environment be provided by other means; (classical examples would be package managers, Capistrano
for Ruby, etc.;)
server deployment --- suitable to deploy the entire host environment, possibly even including the
application deployment, but it will still require an application execution solution; (classical examples
would be Chef or Puppet;)
task automation --- low-level tools that, if required, would allow to quickly implement our own
solution that would fit in one of the above categories; (classical examples would be Ant for Java, Fabric
for Python, etc.;)
library --- the described solution is actually a library to be used inside our programs; here we include
also platforms or frameworks, which although more complex than libraries, are still used only to
develop applications;
service --- referring to solutions which are stand-alone services, which on their own do not provide
direct benefits, but which are either used as dependencies of our environment, or if integrated it would
provide added value to it and then to our users; (for example database systems, various middleware,
logging or monitoring systems or SaaS, etc.;)
standard --- although not a ready to be used solution, this could be a protocol, data format, guidelines
or other kind of specification, that could prove useful to implement or follow ourselves;
suitability
Shortly, how mature, or production ready, is the solution? Does it have a supportive community built around it.
production;
emerging --- usually either a very popular solution, or one backed by a large company, but not yet
reaching or surpassing the beta status;
prototype --- maybe not the best solution to adopt, but it could have important features that we could
leverage or re-implement;
legacy --- although not a choice for most new developments, it could prove important to address,
because it either has a large deployment base, or it is mandated by one of the case studies;
MODAClouds

th
2013 112
application domain
What would be the main flavour of targeted applications?
web applications;
map-reduce applications;
generic compute-, data-, or network-intensive applications;
application architecture
Broadly matching a targeted application architecture.
2-tier applications --- monolithically applications that besides the data storage or communication layer,
have a single layer handling all the concerts from user interface to logic;
n-tier applications --- SOA-inspired applications where parts of the application are clearly identified as
independent layers, and deployed accordingly;
application restrictions
What constraints would the application (and part of our run-time environment) be subjected at?
none --- the application is able to use all the features of the targeted programming language and the
targeted framework, including full control over the run-time environment; moreover the application is
able to interact with other OS artifacts (like file-system, processes, sockets, etc.); (e.g. Amazon
Beanstalk;)
container --- like in the case of no restrictions, except that interactions with the run-time or the OS are
limited;
limited --- the application is able to use only some features of the targeted language or framework, and
most likely interactions with the run-time and the OS are limited (i.e. native libraries are forbidden, file-
system access is restricted, etc.); (e.g. Google App Engine;)
programming languages
Self explanatory
programming frameworks
Some solutions target a particular framework (such as Servlets for Google App Engine's Java environment,
Capistrano tightly focused on Ruby on Rails deployment, etc.). Thus it would prove useful to know in advanced
which are the officially sanctioned or preferred frameworks.
scalability
How can scalability be achieved?
automatic scalability --- based on user defined policies the platform is able to provision and commit
new computing resources; (i.e. the platform decides and executes;)
manual scalability --- the user is able to control via a high-level UI or CLI the amount of provisioned
and committed computing resources; (i.e. the operator decides, the platform executes;) (this implies that
the platform is able to provision new resources by itself;)
passive scalability --- the platform itself is able to scale if computing resources are manually provided
by the operator himself; (i.e. the operator decides and executes, the platform only takes notice and
reacts;) (this implies that the platform is not able to provision resources by itself;)
session affinity
Usually PaaS offers HTTP request routers (or dispatchers); how does they load-balance clients between the
multiple available service instances?
transparent --- the solution provides automatic session replication between multiple instances (most
likely through a shared database);
MODAClouds

th
2013 113

sticky-sessions --- all the requests originating from the same client are routed to the same instance;
non-deterministic --- (self-explanatory);
interaction
How can we pragmatically interact with the proposed solution?
WS (Web Service) --- the interaction can be made through HTTP calls (either SOAP+WSDL or REST-
full); (this implies that the is a public specification of such calls, or they are easily reveres engineered);
WUI (Web User Interface) --- although this interface is provided remotely through HTTP, it's suitable
for human operators and can't be easily consumed by an automated tool;
CLI (Command Line Interface) --- there are command line tools that interact with the solution (most
likely through HTTP or some form of RPC); (this implies that the input / output format are easy to parse
by another tool, and as above specification is available);
CUI (Console User Interface) --- the provided command line tools are not suitable for being invoked by
other tools, because for example the input / output are human-centric and difficult to parse;
API (Application Programmable Interface) --- the solution also provides a library that abstracts one of
the previous interaction methods;
hosting type
How would we be able to use the proposed solution?
hosted --- the proper meaning of the term PaaS;
deployable (closed-source) --- available for deployment in a private cloud, but the code is closed-
source;
deployable (open-source) --- available for deployment in a private cloud, and the code is available as
open-source, thus enabling modifications;
simulated --- there is an option to deploy locally a similar solution for development and debugging
purposes;
portability
If a developer uses a particular solution, how easy is to him to move to another solution having the same role?
locked -- to move to a different solution would require massive rewriting of the application;
portable -- possible with minor updates to the application;
out-of-the-box -- the solution uses existing standards thus portability is guaranteed;
services
Especially in the case of PaaS, what additional resources or services (such as databases, middleware, etc.) are
available and managed directly by the solution, and thus integrated with the application life-cycle?
monitoring coverage
Especially in the case of PaaS, how much do the monitoring facilities cover and expose to the operator?
none -- the solution provides no monitoring options (except maybe the listing of running processes or
logging, etc.);
basic -- the usual information that is comprised of CPU, memory and disk usage;
extensive -- it provides many other metrics than the ones above;
monitoring level
MODAClouds

th
2013 114
From which perspective, or at which level of the software and infrastructure stack, are the metrics provided?
application -- the data is collected from within the application itself; (for example by using NewRelic,
etc.;)
container -- the data is collected from within the VM or the container; it could refer to the VM or the
container itself or the whole running application;
hypervisor -- the data is collected by the virtualization solution;
fabric -- the data is collected at the infrastructure layer; (for example raw disks, load balancers, routers,
switches, etc.);
monitoring interface
What technique --- such as standard, API, library, etc. --- is used to expose the monitoring information to the
operator?
resource providers
Most of the PaaS do not also have their own hardware resources, but instead are built on top of other publicly
accessible IaaS providers. Thus if the user needs services not offered by the PaaS itself, it could use that IaaS to
host the missing functionality himself.
multi-tenancy
This characteristic pertains mainly to PaaS or PaaS-like solutions, and tries to asses if multiple applications can
share the same instance of the PaaS.
single application --- the entire PaaS instance is dedicated to only one application; (some deployable
PaaS solutions fit into this category;)
single organization --- the PaaS is able to host multiple independent applications, but they should
belong to the same organization, mainly because the security model is restricted, or the scheduling
model implies a fair behaviour; (almost all other deployable PaaS solutions fit into this category;)
multiple organizations --- the PaaS is shared between multiple parties, each possibly with multiple
applications; (all hosted PaaS's fit in this category;)
resource sharing
This characteristic pertains mainly to PaaS or PaaS-like solutions, and tries to assess how are the application's
components or services mapped on the provisioned VM's.
1:1 --- each component or service (from each application where applicable) is deployed on its own VM;
such a usage pattern would better fit heavy-weight applications, that have few component or service
types, featuring constantly high load; thus an instance wouldn't interfere with another, through shared
resource consumption;
n:1 --- more than one component or service (potentially from different applications in case of multi-
tenancy) can be deployed on the same VM, thus sharing its resources; this usage pattern would allow
cost savings, especially in development or initial deployments, until the product gains traction and
increased load, where a 1:1 pattern would prove more efficient;
limitations
Most of the solutions impose quantitative limitations (such as memory, bandwidth, storage, etc.) on the running
applications, which could be of interest especially in determining the suitability for our case studies.
We should observe that not all of these properties or capabilities apply to all the surveyed solutions.

MODAClouds D6.1 AnalysisOfTheStateOfTheArtAndDefiningTheScope

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

MODAClouds D6.1 AnalysisOfTheStateOfTheArtAndDefiningTheScope

Enviado por

Direitos autorais:

Formatos disponíveis

Grant Agreement N FP7-318484

Copyright 2012 by the MODAClouds consortium All rights reserved.

Você também pode gostar