Você está na página 1de 4

Investigating the use of Bayesian networks as a support tool for monitoring software projects

Fbio Pittoli1, Abraham L. R. de Sousa1, 2


Centro Universitrio La Salle - Unilasalle Canoas, Brazil {fabio.pittoli@gmail.com, rabelo@unilasalle.edu.br}
Abstract The monitoring of software development is one of the most important activities of software projects management. In this context, this paper proposes a Bayesian approach integrated with a software process management environment. The aim is to investigate how this probabilistic approach can be used for projects monitoring. Preliminary results indicate that the use of Bayesian networks brings the power of quantitative and qualitative evaluation of some common scenarios of project management, leading to the manager a greater power of decision making. Software Development Process; Bayesian Network; Project Management
1 2

Daltro J Nunes2
Instituto de Informtica Universidade Federal do Rio Grande do Sul - UFRGS Porto Alegre, Brazil {rabelo, daltro@inf.ufrgs.br} the project can assess and monitor the implementation costs, evaluate proposals, and develop realistic budgets and schedules. Whichever method you choose to run estimates, it is always important to observe that the estimation process is a complex domain where the causal relationship among factors are nondeterministic and with an inherently uncertain nature [6]. For example: we can assume that there is a clear relationship between development effort and team experience, and that when team experience increases, the effort decreases, although, there are no concrete data proving that. So, would be correct to say that handle estimates, necessarily, result in dealing with uncertainties. A. Bayesian Networks Bayesian networks (BN) were developed in the early 80s to facilitate the task of predicting and diagnosing in Artificial Intelligence (AI) systems [1]. The name Bayesian Network derives from the use of the mathematical formula for the calculation of probabilities established in 1763 by Thomas Bayes. According to [2], the BN allow us to express complex cause-effect relationships based on the problem investigated. The graphical representation of a Bayesian network is composed of nodes that represent random variables that assume discrete values or continuous. The arches represent the causal relationships among nodes. For example, we can consider a classic case of BN presented by [7] about a new burglar alarm. This alarm is very reliable; however, it can also trigger if an earthquake occurs. Two neighbors, John and Mary, pledged to call if the alarm is ringing. John always calls when he hears the alarm, but, sometimes confuses the alarm with the phone and call also in these cases. Maria, however, likes to listen to loud music and sometimes does not hear the alarm. Fig. 1 shows the defined the network topology to this case and tables of conditional probabilities to each node. This form of representation can be used to represent discrete variables or the continuous variables. Each of the lines in the table contains the conditional probability for each conditional case parent nodes. A conditional case represents a possible combination of values for the parent nodes [8]. B. Related Work Within several studies that use BN to support software process development, we tried to divide the references among

I.

INTRODUCTION

One of the main challenges of any project manager is trying to ensure that software development project will be concluded within the constraints of time, cost, scope, and quality. Establishing these constraints means working with estimates. But the software development is an inherently uncertain endeavor, because there is no way to ensure that during the progress of the project delays will not occur, will not lack resources, or the scope will not change. Within this context of reasoning under uncertainty arise the Bayesian networks [1][2][3] that are used in situations where there are causal relations, but our understanding of what is really happening is incomplete, requiring probabilistic description for a better understanding. This type of network can be used for different types of reasoning, as predictive analysis, to investigate the impact of changes (cause and effect), and support decisionmaking. The main goal of this paper is to show the preliminaries results of a study that aims to integrate an environment for project management with Bayesian networks during the monitoring of software development projects. In this sense, the manager interacts with a Bayesian model to identify potential behaviors of the project and thereafter, decisions are made. II. THEORETICAL GROUNDING

Estimated time and cost development is an activity that requires attention and that has great influence in the process of software development. Its the estimation definition which ensures that a project will succeed or not during its execution. Effort estimates are useful for clients and developers. [5]. Based on these estimates, the organization that wants to hire

some of the key areas of management and software development, such as: risk management, predicting failures and effort estimation.

activities and agenda of the developers. It was chosen as a supportive environment for project management because allows a rigorous control over the variables that form a software process. Bayesian networks tool: in this study we used GeNIe 1 , developed by Decision Systems Laboratory, from Pittsburgh University. The GeNIe software was used to model the Bayesian networks for the evaluation of the proposed model, because of its ease of use and that the free version has no restrictions on the maximum possible size for a BN. A. Presentation of the monitoring model scenarios For better understanding of the monitoring and control model proposed, it is necessary to presentation and analysis of scenarios where the model will work and to observe the pre and post conditions required for proper operation. Fig. 2 presents the scenario 1 that handles the BN configuration required for the correct functioning of the model. As prerequisites, it is necessary a previous modeled process in the WebAPSEE; furthermore, the topology of the network should already be set according to the aspect that you wish to monitor in the process (time, cost, quality,etc.). From this, is shown a web interface through which you can select relevant data from the running process according to the aspect that you wish to monitor. For example: considering that you want to monitor the process on the aspect of time, would be possible to select items such as number of agents involved and total hours remaining and so on. From this, the selected data will be extracted from the running process. After extraction of this information, a table containing the information extracted is generated automatically. Through this table will be possible to identify information used to configure the evidence in the BN, which is the post condition of the scenario 1.

Figure 1. Bayesian network with the conditional probabilities

Risk Management: the study presented in [9] proposes a standard architecture for risk identification called Risk Identification Pattern model. The use of Bayesian networks as the main component of the model made it possible to represent the relationships among risk factors present in web projects. Predicting Failures: in [12] is presented a review of the use of BN for predicting faults and software reliability. Beside this, it proposes an approach that allows us to use dynamic discretisation algorithms for continuous nodes. Effort Estimate: stands out the comparative study of models of Bayesian networks focused on the effort estimation in web projects presented in [6] that disseminates the results of an investigation where eight Bayesian network models were compared for their accuracy in estimating effort for web projects. The results showed that the Bayesian networks represent a suitable approach for the treatment of effort estimates. III. BAYESIAN NETWORKS
AS A TOOL TO SUPPORT MONITORING OF PROJECTS

Monitoring software projects using mechanisms to detect changes during its progress contributes to that unexpected events do not deviate the planning. If this happens, it is possible to make changes in order to adapt to new reality imposed a less traumatic and fastest possible. Unlike other approaches used to make estimates that use parametric measure, the use of BN suggests a statistical value (approx). The following are the main software components used to compose the monitoring model proposed. After this, is shown a conceptual model of the proposed solution, which seeks to provide how will the interaction among the tools that form the solution, besides the characteristics and details of implementation: WebAPSEE[14]: through this environment is possible to model a development process, defining the activities, the sequence among them, the papers involved, and the execution time. The environment allows its execution through a machine that coordinates the activation of

Figure 2. Bayesian network configuration

The scenario 2 refers to the monitoring process. As a prerequisite for proper functioning of scenario 2 we have the fact that the process is running and the BN is configured. Initially, it is necessary to identify changes of state in the running process. Thus, a Windows Service has been developed in order to monitor the data involved in the process. This action of checking changes in the process is done in time intervals previously established. So, it make possible to use the latest data from running process directly in the web interface. After this, we can update the BN, spreading the current state of the

http://genie.sis.pitt.edu/

process for the network. Thus, this is the post-condition required for scenario 2. B. Prototype developed After the presentation of the scenarios observed, it is necessary to present a prototype of the proposed model. It is essential that, at first, we have a software process modeled and running in the WebAPSEE tool. Another important aspect concerns the selection of the network model to be used by software components modeling Bayesian networks GeNIe. For example: if the aspect which you want to monitor in the process is the Time, it is necessary that the model of Bayesian network analysis has some node with characteristics related to Time. The same analogy applies when you want to monitor other aspects of the process, as Cost, for example. From this it becomes possible to begin the process of monitoring and use the web interface developed. The Fig. 3 shows the web interface developed.

Service execution. The goal is to ensure the ability to view directly in the table the latest data from the process to enable the identification of evidence and use them in their Bayesian network used in software GeNIe or Netica. IV. MODEL EVALUATION

In the evaluation is modeled a development process in the WebAPSEE software. The idea is that this process be as similar as possible with a specified process to model the development of a real software project, but, being a procedure evaluation for a model created for proof of concept, the process used is simplified, containing only features that are fundamental to the proposed assessment and with reference to the tasks defined by the RUP (Rational Unified Process) for small projects methodology. Some Bayesian network models present in the literature were used and adapted to make them in accordance with the data that is extracted from the process and are present in the data file generated by the web interface developed. So, independent of the chosen characteristic to be monitored , the Bayesian network defined will allow a true representation of the running process in the WebAPSEE. The evaluation seeks to answer some key questions of project managers referring to changes in estimates that are recur in software projects. A. Software process In order to evaluate the proposed model, a process has modeled considering some of the main activities defined in the RUP for Small Projects methodology. Using the WebAPSEE software as a tool for process modeling, aimed to organize the activities by disciplines, according to what is proposed by methodology. The used disciplines by the process were: Requirements, Project Management, Analysis, Implementation, Tests, Change Management. It is also important to mention that this is an iterative and incremental process [15], in other words, each stage is executed several times during the development process. This allows that our understanding about the problem increases through successive refinements, making an effective solution is obtained after many iterations. B. Bayesian networks models The modeling of Bayesian networks used for implementation and evaluation of the evaluated scenarios were constructed based on the model known as MODIST [16], which cares about the quality of predictions and with risk management in large software projects. The MODIST project is based on Bayesian network and it tries to produce development models and testing process that take into account statistical concepts missing in traditional approaches to development. It was decided to develop a Bayesian network to monitor the Requirements activities set, especially, taking into account the aspect of time. One reason for having been chosen by the requirements lies in fact that is in requirements, which normally has the main problems with estimates. It is also important to mention that it is completely feasible that new networks are designed to monitor other aspects and set of activities, such as Tests, Change Management or Analysis, for example. Furthermore, could also consider the monitored process as whole as a single activity and, so, develop a single network responsible for taking care of all aspects. Since the

Figure 3. Web interface for data collection

This web interface is organized basically as follows: a) Characteristics of the process: at the top of the page are various HTML controls type radio, where each identifies and makes it able to be selected each of the different characteristics relate to a software process, as Time, Cost and Quality; b) Data related: after selecting a feature, are displayed in the field below, various controls type checkbox where eache of them with respect to a given related to the selected feature. It is possible to select various data related to the same characteristic; c) Generate data file: control type button whose essential function is to confirm the selection of parameters and initiate the process of generation of the file containing the process data, subsequent display of data in the informative table containing the data evidence and boot the Windows Service responsible for monitoring the database process and update the data file; d) Informational message: if everything went as expected and without occurrence of errors, the data file was created/updated correctly, the Windows Service was successfully started and the following message is displayed: File created/updated successfully. Monitoring service initialized correctly. However, if na error occurs during generation of the data file or during Windows Service boot, the following message is displayed: An error occurred while attempting to generate the file. The monitoring service was not initialized; and e) Table containing the data from the monitored process: after the creation / update of the file containing the data from the monitored process, a table in the web interface is responsible for listing the data. This update is always done periodically and is performed after each Windows

goal is, from the Bayesian network, can to answer some key questions of project managers in relation to possible changes in estimates, it has searched divide the evaluation scenarios, where each will show different situations in relation to requirements and, from the results indicated by the network, will be suggested a response to pointed questioning. C. Evaluated Scenarios To make the inference of Bayesian network for the subsequent definition and identification of evidence, in each of the evaluated scenarios the network was initially trained with historical data from 100 completed projects and that have some similarity with the process analysis. The historical data base used is constituted by data from laboratory simulation and purpose of the scenarios is to simulate real situations that may occur during a monitoring a software process. Is shown, a step by step example of a scenario used to verify the efficiency of the developed model: A software process that has the following characteristics: A high degree of novelty in what will be developed (Novelty); b) High Complexity (Complexity); c) Big size (Size); d) Team with a low degree of experience (Staff Experience); and e) High estimated effort (Estimated Effort). Questioning: the inclusion of more experienced professional in the project will make the effort required to complete the project decrease? After configuring the situations mentioned above, where in the modeled Bayesian network will be the evidence, shows that, when we have a situation as described above, there is a high probability (51%, in this case) that the Required Effort (Required Effort) to complete the project is high (High). The Fig. 4 demonstrates the behavior of the Bayesian network after setting evidence. The question asked about this scenario concerns the possible reduction of required effort if they are allocated to the process new professionals who possess a greater degree of experience.

V. FINAL CONSIDERATIONS This paper presented a probabilistic approach to support the monitoring of software development projects. From the model here presented, becomes possible to develop a series of other experiments and improvements. For future work, we can mention the possibility of checking and measuring the impact that changes in estimates during the progress of the project will have on the quality of the final product. Another point that could be further explored in future concerns the possibility to centralize in one place both the data collection process monitored and the graphical display of Bayesian networks for the subsequent configuration of inferences, allowing ease of use of the model. Furthermore, new Bayesian networks can be developed in order to monitor activities of other groups of a given project, different from those that have been mentioned here and presented. It is important to mention that the model presented here is constantly evolution, because it can be improved and adapted to the software process used according with the models of Bayesian networks used, can be more or less complex depending of aspect that want to evaluate and depending on the number of variables that will be involved in. REFERENCES
[1] [2] Charniak, E. Bayesian Networks Without Tears. AI MAGAZINE, v. 12, n. 4, p. 50-63, 1991. Fenton, N. E.; Neil, M. A Critique of Software Defect Prediction Models. IEEE Transactions on Software Engineering, v. 25, p. 675-689, 1999. Stamelos, I. et al. Estimating the development cost of custom software. Inf. Manage., v. 40, p. 729-741, 2003. Mendes, E.; Mosley, N. Bayesian Network Models for Web Effort Prediction: A Comparative Study. IEEE Transactions on Software Engineering, v. 34, p. 723-737, 2008. Russell, S. J.; Norvig, P. Artificial intelligence: a modern approach. [S.l.]: Prentice-Hall, Inc., 1995. 415-429 p. Marques, R. L.; Dutra, I. Redes Bayesianas: o que so, para que servem, algoritmos e exemplos de aplicaes. Universidade Federal do Rio de Janeiro. [S.l.]. 2000. Al-Rousan, T.; Sulaiman, S.; Salam, R. A. A risk identification architecture pattern based on Bayesian network. [S.l.]: [s.n.]. 2008. p. 110. Jeet, K. et al. A Tool for Aiding the Management of Schedule Overrun. IEEE 2nd International Advance Computing Conference, v. 2, p. 416421, 2010. Xiaocong, H.; Ling, K. A risk management decision support system for project management based on bayesian network. 2nd IEEE International Conference on Information Management and Engineering, v. 2, p. 308312, 2010. Fenton, N. E.; Neil, M.; Marquez, D. Using Bayesian Networks to Predict Software Defects and Reliability. [S.l.]: [s.n.]. 2007. Sanchez, A. J. Software maintenance project delays prediction using Bayesian Networks. Expert Syst. Appl., v. 34, p. 908-919, 2008. Ambiente WebAPSEE. Simpsio Brasileiro de Engenharia de Software Florianpolis: Informtica-UFSC, v. 1, p. 1-6, 2006. Kruchten, P. The Rational Unified Process: An Introduction. 3. ed. [S.l.]: Addison-Wesley Longman Publishing Co., Inc., 2003. Modist. Models of Uncertainty and Risk for Distributed Software Development. EC Information Society Technologies Project IST-200028749. [S.l.]. 2003.

[3] [4]

[5] [6]

[7]

[8]

[9]

Figure 4. Bayesian network with evidence of the scenario

[10] [11] [12] [13] [14]

The inclusion of more experienced professionals in the monitored process made the Required Effort continue high, but, it decreases from 51% to 47%. Furthermore, the Accuracy, that identifies how close the estimates are in relation to the actually required, also improved, from 53% to 54%. Thus, we can understand that on large projects and have a high degree of complexity and are also completely new in terms of technology or segment, considering an inexperienced team initially, by including in that team new professionals with a high level of experience in relation to the proposed problem, the required effort, still continuing high, have experienced a decrease.

Você também pode gostar