Knowledge Discovery in Data: A Case Study

Abstract

It is common in industrial construction projects for data to be collected and discarded without being analyzed to extract useful knowledge. A proposed integrated methodology based on a five-step Knowledge Discovery in Data (KDD) model was developed to address this issue. The framework transfers existing multidimensional historical data from completed projects into useful knowledge for future projects. The model starts by understanding the problem domain, industrial construction projects. The second step is analyzing the problem data and its multiple dimensions. The target dataset is the labour resources data generated while managing industrial construction projects. The next step is developing the data collection model and prototype data ware-house. The data warehouse stores collected data in a ready-for-mining format and produces dynamic On Line Analytical Processing (OLAP) reports and graphs. Data was collected from a large western-Canadian structural steel fabricator to prove the applicability of the developed methodology. The proposed framework was applied to three different case studies to validate the applicability of the developed framework to real projects data.

Share and Cite:

Hammad, A. and AbouRizk, S. (2014) Knowledge Discovery in Data: A Case Study. Journal of Computer and Communications, 2, 1-28. doi: 10.4236/jcc.2014.25001.

1. Introduction

Many industrial construction projects face delays and budget overruns, often caused by improper management of labour resources [1] . The nature of industrial construction projects makes them more complicated: a large number of stakeholders with conflicting interests, sophisticated management tools, stricter safety and environmental concerns.

In the changing environment, each involved contractor simultaneously manages multiple projects using one pool of resources. During this process, a large amount of data is generated, collected, and stored in different formats, but it is not analyzed to extract useful knowledge. The improvement of labour management practices could have a significant impact on reducing schedule delays and budget overruns. One solution to this problem is analysis of historical labour resources data from completed projects to extract useful knowledge that can be transferred and used to improve resource management practices.

Data warehouses are one method often used to extract useful knowledge. They are dedicated, read-only, and nonvolatile databases that centrally store validated, multidimensional, historical data from Operation Support Systems (OSS) to be used by Decision Support Systems (DSS) [2] . Data warehouses are typically structured either on the star schema, consisting of a fact table that contains the data and dimension tables that contain the attributes of this data, for simple datasets, and on the snowflake schema, used either when multiple fact tables are needed or when dimension tables are hierarchical in nature [3] , for complicated datasets. A data warehouse typically consists of three main components: the data acquisition systems (backend), the central database, and the knowledge extraction tools (frontend) [4] . On Line Analytical Processing (OLAP) techniques (roll-up and drill-down, slice and dice, and data pivoting) are typically used in the frontend of a data warehouse to present end-users with a dynamic tool to view and analyze stored data.

Data mining is “the analysis of observational datasets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owners” [5] . Considering data mining, the knowledge discovered must be previously unknown, non-trivial, and useful to the data owners [6] . Data mining techniques rely on either supervised or unsupervised learning and are grouped into four categories [7] . Clustering methods minimize the distance between data points falling within a cluster, and maximize the distance between these clustered data points and other clusters [8] . Finding Association Rules highlights hidden patterns in large datasets. Classification techniques, including Decision Trees, Rule-Based Algorithms, Artificial Neural Networks (ANN), k-Nearest Neighbours (k-NN or lazy learning), Support Vector Machine (SVM), and many others, build a model using a training dataset to define data classes, evaluate the model, and then use the developed model to classify each new data point into the appropriate class [7] . Outliers’ detection techniques focus on data points that are significantly different from the rest.

Data warehousing and mining techniques have been applied to solve problems in the construction industry over the last decade. However, none of the previous research applied these techniques to address management of multiple projects simultaneously using one common pool of labour resources; the problem is typically solved using other techniques (Heuristic rules, Numerical Optimization and Genetic Algorithms). Most previous research focused on leveling or allocating resources in a single project environment. Soibelman and Kim [9] analyzed schedule delays with a five-step KDD approach. Chau et al. [10] developed the Construction Management Decision Support System (CMDSS) by combining data warehousing, Decision Support Systems (DSS) and OLAP. Rujirayanyong and Shi [11] developed a Project-oriented Data Warehouse (PDW) for contractors, but it was limited to querying the warehouse without using data mining. Moon et al. [12] used a four-dimension cost data cube in their application of Cost Data Management System (CDMS), built using MS SQL Server-OLAP Analysis Services, to obtain more reliable estimates of construction costs. Fan et al. [13] used the Auto Regression Tree (ATR) data mining technique to predict the residual value of construction equipment.

In this research, the Cios et al. [7] hybrid model was modified and adapted to develop an integrated methodology for extracting useful knowledge from collected labour resources data in a multiple-project environment utilizing the concepts of KDD, data warehousing, and data mining. When the techniques are integrated, they combine quantitative and qualitative research approaches and facilitate working with large amounts of data impacted by a large number of unknown variables, which was integral to this research. Further information on the developed framework can be found in Hammad et al. [14] . The proposed integrated methodology based on a five-step Knowledge Discovery in Data (KDD) model is shown in figure 1.

In this paper, the proposed modified hybrid KDD model is applied to three different case studies to test its ability to extract useful knowledge from datasets. Section 2 discusses discovering knowledge in the first dataset; Section 3 covers the second dataset and Section 4 the third dataset. The paper outlines the process of applying the model to extract data, the related procedures, and outlines the useful data collected.

Figure 1. Modified hybrid KDD model.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Jergeas, G. (2008) Analysis of the Front-End Loading of Alberta Mega Oil Sands Projects. Project Management Journal, 39, 95-104. http://dx.doi.org/10.1002/pmj.20080
[2] Inmon, W.H. (2005) Building the Data Warehouse. Wiley, Indianapolis.
[3] Giovinazzo, W.A. (2000) Object-Oriented Data Warehouse Design: Building a Star Schema. Prentice Hall, Upper Saddle River.
[4] Ahmad, I., Azhar, S. and Lukauskis, P. (2004) Development of a Decision Support System Using Data Warehousing to Assist Builders/Developers in Site Selection. Automation in Construction, 13, 525-542. http://dx.doi.org/10.1016/j.autcon.2004.03.001
[5] Han, J. and Kamber, M. (2006) Data Mining: Concepts and Techniques. Morgan Kaufmann, Elsevier Science Distributor, San Francisco.
[6] Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P. (1996) From Data Mining to Knowledge Discovery in Databases. AI Magazine, 17, 37.
[7] Cios, K.J. (2007) Data Mining: A Knowledge Discovery Approach. Springer, New York.
[8] Zaiane, O.R., Foss, A., Lee, C.H. and Wang, W. (2002) On Data Clustering Analysis: Scalability, Constraints, and Validation. Proceedings of the 6th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer-Verlag, Berlin, 28-39.
[9] Soibelman, L. and Kim, H. (2002) Data Preparation Process for Construction Knowledge Generation through Knowledge Discovery in Databases. Journal of Computing in Civil Engineering, 16, 39-48. http://dx.doi.org/10.1061/(ASCE)0887-3801(2002)16:1(39)
[10] Chau, K.W., Cao, Y., Anson, M. and Zhang, J. (2002) Application of Data Warehouse and Decision Support System in Construction Management. Automation in Construction, 12, 213-224.
http://dx.doi.org/10.1016/S0926-5805(02)00087-0
[11] Rujirayanyong, T. and Shi, J.J. (2006) A Project-Oriented Data Warehouse for Construction. Automation in Construction, 15, 800-807. http://dx.doi.org/10.1016/j.autcon.2005.11.001
[12] Moon, S.W., Kim, J.S. and Kwon, K.N. (2007) Effectiveness of OLAP-Based Cost Data Management in Construction Cost Estimate. Automation in Construction, 16, 336-344. http://dx.doi.org/10.1016/ j.autcon.2006.07.008
[13] Fan, H., AbouRizk, S., Kim, H. and Zaiane, O. (2008) Assessing Residual Value of Heavy Construction Equipment Using Predictive Data Mining Model. Journal of Computing in Civil Engineering, 22, 181-191. http://dx.doi.org/10.1061/(ASCE)0887-3801(2008)22:3(181)
[14] Hammad, A., AbouRizk, S. and Mohamed, Y. (2013) Application of Knowledge Discovery in Data (KDD) Techniques to Extract Useful Knowledge from Labour Resources Data in Industrial Construction Projects. Journal of Management in Engineering. http://dx.doi.org/10.1061/(ASCE)ME.1943-5479. 0000280
[15] Zaiane, O.R. (2006) Principles of Knowledge Discovery in Data. Lecture at University of Alberta. http://webdocs.cs.ualberta.ca/~zaiane/courses/cau/slides/cau-Lecture7.pdf
[16] Witten, I.H. and Frank, E. (2005) Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufman, Amsterdam, Boston.
[17] Teicholz, P. (1993) Forecasting Final Cost and Budget of Construction Projects. Journal of Computing in Civil Engineering, 7, 511-529. http://dx.doi.org/10.1061/(ASCE)0887-3801(1993)7:4(511)
[18] Nassar, N.K. (2005) An Integrated Framework for Evaluation, Forecasting and Optimization of Performance of Construction Projects. PhD Thesis, University of Alberta (Canada), Canada.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.