Automatic Risk Identification in Software Projects: an Approach based on Inductive Learning

doi:10.4236/iim.2012.425041

Paper Menu >>

Journal Menu >>

Intelligent Information Management, 2012, 4, 291-295

http://dx.doi.org/10.4236/iim.2012.425041 Published Online October 2012 (http://www.SciRP.org/journal/iim)

Automatic Risk Identification in Software Projects:

An Approach Based on Inductive Learning

Julia Botan Machado, Silvio do Lago Pereira

Department of Information Technology, São Paulo State Technological College—FATEC/SP, São Paulo, Brazil

Email: julia.botan@gmail.com, slago@pq.cnpq.br

Received August 30, 2012; revised October 4, 2012; accepted October 16, 2012

ABSTRACT

Effective risk management is very important to increase the probability of success in software projects. Indeed, like

other types of projects, software projects are also susceptible to various problems that can lead to the cancelation of

their development or to the development of systems that do not meet the client’s requirements. One of the main activi -

ties of risk management is the risk identification , because the list of risks generated in this activity is u sed all along the

risk control process. Thus, this work proposes the creation of an expert system which is capable of identifying risks in

software projects by using the lessons inductively learned from similar software projects already developed. By using

this proposed expert system, project managers and software developers must be able to avoid errors of the past.

Keywords: Risk Management; Risk Identification; Software Engineering; Expert System

1. Introduction

There are many different definitions of risk in literature.

In this work, risks are defined as future events with some

probability of occurrence and a potential for loss. Every

project is subject to risks and the role of a project risk

manager is to anticipate the risks that can compromise

the successful completion of a project and to plan how to

proceed if they occur, in order to minimize the loss [1].

Effective risk management is crucial for the success of

a project. Notwithstanding, risk identification is a very

hard prediction problem and most software project man-

agers still have great difficulty in performing this task. In

order to overcome this difficulty, this work proposes the

implementation of an expert system capable of identify-

ing risks in software projects, by using lessons induc-

tively learned from similar projects developed in the past.

The assumption is that the experience acquired in previ-

ous projects is the main tool which developers have to

aid them in the management of new similar projects. The

proposed automatic risk identification procedure is based

on the checklist technique [2,3], in which a checklist is

used to verify whether a specific risk can or cannot occur

in a project. In the system, the checklist is represen ted by

a decision tree [4], built by an inductive learning algo-

rithm [5,6] that works over a database containing char-

acteristics of previous projects and their corresponding

risks pointed out by experts in risk management.

The remaining sections of this paper are organized as

follow: Section 2 defines the problem addressed in this

work; Section 3 presents the concepts and techniques of

artificial intelligence used to implement an expert system

to solve that problem; Section 4 briefly discusses em-

pirical results obtained with the system; and, finally, Sec-

tion 5 presents the fi nal conclusion.

2. Risk Management in Software Projects

Although risk management is not a linear process, often

it is divided into four phases: risk identification, risk as-

sessment, risk response planning, and risk monitoring

and controlling [1]. Clearly, risk identification is the

main phase in this process, since all the other phases de-

pend on the correct identification of the risks. In fact, to

properly manage risks, the first thing that risk managers

should be able to do is to determine what risks can dam-

age its projects and to recognize their characteristics.

Risk identification is particularly important for soft-

ware projects, because this kind of project involves in-

herent uncertainties that are very hard to control, e.g.,

technological innovations and changes in the client’s re-

quirements. Indeed, due to these uncertainties, most

software projects do not comply with the deadline or

budget initially planned for them and, even worse, most

of the products do not meet the client’s expectations in

terms of functionality and quality [7].

Paradoxically, in spite of the fact that most of the pro-

ject failures are closely related to failures in the risk

identification phase, most of project managers and soft-

ware developers still perceive this activity as a useless

J. B. MACHADO, S. DO LAGO PEREIRA

292

and hard extra work and, as soon as they can, they read-

ily forget it [8]. This happens mainly because there are

few tools that can be used to tu rn this activity easier [9].

To identify risks in new projects, the best practices in

risk management established by PMBOK (Project Man-

agement Body of Knowledge) [10] strongly recommend

the use of historical data, collected during the risk identi-

fication phase for similar projects developed in the past.

However, although most of the organizations have a

large volume of documents about previous projects, the

manual extraction of useful information from this data is

not an easy task.

Thus, the main contribution of this work is to propose

a tool that can aid project managers and software devel-

opers in the task of risk identification and, consequently,

to avoid that so important activity can become over-

looked. More specifically, the proposed tool is an expert

system that can identify risks in new projects, based on

the history of risks already identified in similar projects.

3. The Expert System for Risk Identification

Experts in risk management advice that an effective risk

identification shou ld be performed by taking into account

results of studies done by experts in risk management, as

well as documents about lessons learned during the risk

management process for other similar projects already

concluded [11]. To do so, project risk managers should

collect documents describing projects characteristics and

the corresponding risks identified for them.

By using such a collectio n of documents, it is possible

to implement a computer system that automatically iden-

tifies risks in new projects, based on the experience ac-

cumulated by human experts in the past. In fact, this is

the very approach adopted in this work, as depicted in

Figure 1. Moreover, at the end of each project, the sys-

tem can update its knowledge base with the new lessons

learned, such that they can be used in future projects (in-

creasing the effectiveness and efficiency of the system).

The background on artificial in telligence and the tech-

niques used to implement this system are succinctly in-

troduced in the next two subsections.

3.1. Inductive Learning of Decision Trees

A decision tree [4] is a data structure, representing a set

of classification rules, which can be used to model induc-

tive learning and decision making abilities. The decision

tree construction emulates a learning process, while its

use emulates a decision making process.

A decision tree is a decision support tool, in the form

of a tree graph, which models decisions and their possi-

ble consequences. A decision tree learning algorithm is a

method used in data mining [12] to create a model that

predicts the v alue of an output variable, or targ et variable,

based on the values of input variables. A trivial example

of a decision tree, with only one input variable, is de-

picted in Figure 2. In such tree, each nonterminal node

corresponds to an input variable; the edges leaving a non-

terminal node represent all the possible values of that in-

put variable; and each leaf represents a value of the target

variable, given the values of the input variables repre-

sented by the path from the root to the leaf.

To build a decision tree, an inductive learning algo-

rithm needs to receive as input a set of examples of the

concept that it should learn. Thus, this kind of learning

is called supervised learning. Besides, the set of exam-

ples is often called training dataset. Each example is a

tuple formed by the values of the input variables (al-

ways available) and also the value of the output variable,

or target variable (available only in examples). The idea

is that, by analyzing the correlations among the values

of input and output variables, the learning algorithm can

build a hypothesis that, afterwards, can be used to cor-

rectly predict the target variable value, in cases where

only the input variable values are known. To validate

the efficiency of the decision tree built, another set of

brand new examples, called test da taset, is used. In this

case, the values of the target variable in the examples

are compared with those predicted by the hypothesis.

The efficiency of the decision tree can be given as the

ratio of the number of hits and the number of examples

in the test dataset.

A tree can be built by recursively splitting a training

dataset into subsets based on the values of a selected

Figure 1. The architecture of the expert system.

J. B. MACHADO, S. DO LAGO PEREIRA 293

Does it rain?

Yes

Take umbrella Don’t take umbrella

Figure 2. A decision tree for the “umbrella problem”.

input variable. The recursion terminates when the subset

at a node has all the same value for the target variable, or

when splitting no longer enhance the predictions. This

process of top-down partitioning is a kind of greedy al-

gorithm, and is the most common strategy for learning

decision trees from data. After construction and valida-

tion, the resulting d ecision tree can be used to emulate an

efficient making decision process.

To guarantee the efficiency of the decision tree, the

inductive learning algorithm uses the concepts of en tropy

and information gain [4] to choose the input variable to

label each nonterminal node. The entropy is a measure

based on the occurrence probability of each possible event

(i.e., values of the input variables). The information gain

represents the estimated reduction on the entropy value

resulting from the partition of the set of examples, ac-

cording to the values of the input variable selected to

label a node.

Formally, entropy and information gain can be defined

as follow. Let E be a training dataset with examples of

the form



,,,,



xxy, where each xi is the value of

an input variable vi, for 1, and y is the target

variable value. Also, for a given input variable vi with k

possible values, let pi be the proportion of tuples in E

where the input variable vi has value xi. The information

gain g for an input variable vi is defined in terms of en-

tropy h as follows:

im



log

hEp p









,ij

kvx

gEv hEhE



 

v

The information gain is equal to the total entropy for

an input variable if and only if, for each value of that

variable, the target variab le has the same value.

3.2. Expert Systems

In artificial intelligence, an expert system [13,14] is a

computer program that emulates the ability of decision

making of a human expert. In fact, by reasoning over

facts and rules available in a knowledge base, an expert

system is capable of solving very complex problems.

The standard architecture of an expert system (Figure

1) consists of a user interface that allows the communi-

cation with the user, a knowledge base that stores the

knowledge about the sp ecific application domain, and an

inference engine that uses the available knowledge to

solve problems propo sed by the user.

In the expert system proposed in this work, the knowl-

edge base is implemented as a set of decision trees (one

tree for each risk) and the inference engine is a procedure

that selects a proper decision tree in the knowledge base

and, by reasoning with the rules encoded on this tree,

decides whether a specific risk can or cannot occur, ac-

cording to the projects characteristics informed by the

user.

The decision trees used to populate the knowledge

base of the expert system are automatically generated by

an algorithm of supervised inductive learning. The in-

ductive reasoning implemented by this algorithm allows

the generation of rules about conditions that necessarily

implies specific risks, by analyzing a set of documents

with lessons learned in previously developed projects.

These rules form, in fact, a predictive model that can be

used to identify risks in new projects.

To identify risks in a new project, all that a risk man-

ager needs to do is to access the user interface of the ex-

pert system and inform the projects characteristics. Then,

the expert system should answer with a list of risks

automatically identified for that p roject.

4. The Experiment with the Expert System

To verify the effectiveness of the proposed solution for

automatic risk identification, the expert system of Figure

1 was implemented in the Java programming language,

based on the inductive tree learning algorithm ID3 [4].

This section reports some details of the experiment

performed with the system and discusses some empirical

results, as well.

4.1. Knowledge Base Populating

In order to decide whether a new project has a specific

risk, the expert system must use a list of known risks. As

said before, this list can be generated from a collection of

documents describing lessons learned in previous pro-

jects. Basically, there are two types of risk: generic risks,

which threat the most part of projects, and specific risks,

that threat the specific project under evaluation. Generic

risks can be easily detected by the expert system. On the

other hand, the detection of specific risks is more com-

plicated because, if they were not detected in previous

projects, the knowledge of the expert system might be

insufficient to detect their presence i n a new project.

Moreover, to compare new projects with previous

projects, and decide whether they are similar or not, the

expert system needs to use a predefined set of character-

istics which are common for all projects. These charac-

J. B. MACHADO, S. DO LAGO PEREIRA

294

teristics must be related with risk categories, so that the

expert system can reason properly.

Documents about 20 real software projects were used

in the experiment performed with the implemented ex-

pert system. These documents were kindly delivered by

their respective project risk managers, who also answered

a questionnaire about their project characteristics and

associated detected risks.

The final characterizatio n of the projects was based on

the following attribu tes (input variables):

 User involvement;

 Team experience;

 Appropriated team size (relative number);

 Staff geogra ph i cal distri b ut i o n;

 Team size (absolute number);

 Project priority;

 Amount of involved systems;

 Amount of involved technological platforms;

 Amount of in v olved databases;

 Project size (small, medium, large, huge);

 Existence of test/approval environment.

A list of seven generic risks (output variables), present

in most of software projects, was also created:

 Risk of failing to meet the planned deadline;

 Risk of failing to meet the planned cost;

 Risk of generating a low quality product;

 Risk of ill-defined scope;

 Risk of the project cancellation;

 Risk of project postponement;

 Risk of generating a product that does not meet the

needs of the user.

Based on the managers’ answers for 15 projects, and

each one of these generic risks, a set of documents was

formatted to be taken as input (i.e., training dataset) by

the inductive learning module of the expert system. The

resulting trees, built by this module, were used to popu-

late the expert system knowledge base. More precisely, a

set of seven decision trees, one for each one of the con-

sidered generic risks, were generated. An example of

such decision trees is depicted in Figure 3.

Thus, given the characteristics of a specific software

project, the expert system can use the rules extracted

from the decision tree in Figure 3 to inform whether this

project has or not the risk of being cancelled. In this tree,

each internal corresponds to an input variable, whose

value is informed by the user, and each leaf corresponds

to a possible value of the output variable “risk of the

project cancellation”, whose value should be predicted

by the system.

4.2. Empirical Results

To validate the expert system, the remaining 5 projects

were used as a test dataset.

Team experience?

HighLow

Yes

No Project priority?

Medium High or Low

Team size?

Yes No

Medium

Small or Large

Figure 3. A decision tree for “risk of project cancellation”.

The list of identified risks generated by the exp ert sys-

tem, for each one of these projects, was compared with

the results obtained through the questionnaire answered

by the risk managers, referred in the last subsection.

In the total, there were performed 35 evaluations (i.e.,

7 risks for each one of 5 projects). It was observed that

the result given by the expert system differed from that

given by the corresponding project manager in only 8

evaluations. Thus, the implemented system presented a

hit-rate of 77.14%. This seems to be a very promising

result.

Furthermore, it is believed that, with a knowledge base

populated with more historical data collected from pre-

viously developed projects, it is possible to increase this

hit-rate even more.

5. Conclusions

This paper proposed an expert system capable of identi-

fying risks in software projects, by using lessons induc-

tively learned from similar projects developed in the past.

To evaluate the effectiveness of the implemented sys-

tem, an experiment involving data about real software

projects was performed. This experiment showed that the

experience acquired in previous projects can really be

used to automatically identify risks in new projects and

to avoid repeat mistakes of the past, as well. Thus, by

using a knowledge base continuously updated with les-

sons learned in concluded projects, the performance of

the expert system will be each time better.

A frailty of the proposed system is that it only can

identify risks already detected in previous projects. So, if

a new project is subject to an unprecedented risk, the

system fails to inform the project manager abo ut this risk.

From this observation, it is important to highlight that the

proposed expert system is a tool that must be used only

to help the project risk manager to perform the risk iden-

J. B. MACHADO, S. DO LAGO PEREIRA

295

tification task. A human expert is still indispensable.

6. Acknowledgements

The authors would to thank CNPq, FAT, and especially

to the project managers who contributed to this research.

REFERENCES

[1] R. K. Wysocki, “Effective Project Management: Tradi-

tional, Agile, Extreme,” 5th Edition, John Wiley & Sons

Ltd., Chichester, 2010.

[2] C. A. R. Morano, C. G. Martins and M. L. R. Ferreira,

“Application of Techniques for the Identification of Risk

in the E & P Ventures,” Engevista, Vol. 8, No. 2, 2006,

pp. 120-133.

[3] H. P. Berger, “Risk Management: Procedures, Methods

and Experiences,” Reliability: Theory & Applications,

Vol. 2, No. 17, 2010, pp. 79-95.

[4] J. R. Quinlan, “Induction of Decision Trees,” Machine

Learning, Vol. 1, No. 1, 1985, pp. 81-106.

doi:10.1007/BF00116251

[5] A. Franco-Arcegaa, J. A. Carrasco-Ochoaa, G. Sánchez-

Díazb and J. F. Martínez-Trinidada, “Decision Tree In-

duction Using a Fast Splitting Attribute Selection for

Large Datasets,” Expert Systems with Applications, Vol.

38, No. 11, 2011, pp 14290-14300.

[6] J. Gao and Z. D. Han, “New Decision Tree Algorithm

with Restrained Factor Involved,” Physics Procedia, Vol.

25, 2012, pp. 1871-1878.

doi:10.1016/j.phpro.2012.03.324

[7] I. Sommerville, “Software Engineering,” 9th Edition,

Addison-Wesley, Boston, 2010.

[8] E. E. Odzaly and P. S. Des Greer, “Software Risk Man-

agement Barriers: An Empirical Study,” Proceedings of

the 3rd International Symposium on Empirical Software

Engineering and Measurement, Washington, 15-16 Oc-

tober 2009, pp. 418-421.

doi:10.1109/ESEM.2009.5316014

[9] J. Dhlamini, I. Nhamu and A. Kaihepa, “Intelligent Risk

Management Tools for Software Development,” Pro-

ceedings of the Annual Conference of the Southern Afri-

can Computer Lecturers’ Association, Eastern Cape, 2-11

July 2009, pp. 33-40.

[10] PMI Standards Committee, “A Guide to the Project

Management Body of Knowledge,” 4th Edition, Project

Management Institute, 2008.

[11] Y. H. Kwak and J. Stoddard, “Project Risk Management:

Lessons Learned from Software Development,” Elsevier

Science, Amsterdam, 2003.

doi:10.1016/S0166-4972(03)00033-6

[12] S. H. Liao, P. H. Chu and P. Y. Hsiao, “Data Mining

Techniques and Applications—A Decade Review from

2000 to 2011,” Expert Systems with Applications, Vol. 39,

No. 12, 2012, pp. 11303-11311.

doi:10.1016/j.eswa.2012.02.063

[13] S. H. Liao, “Methodologies and Applications—A Decade

Review from 1995 to 2004,” Expert Systems with Appli-

cations, Vol. 28, No. 1, 2005, pp. 93-103.

doi:10.1016/j.eswa.2004.08.003

[14] S. Lucci and D. Kopec, “Artificial Intelligence in the 21st

Century: A Living Introduction,” Mercury Learning and

Information, Duxbury, 2012.