TITLE:
Using Optimized Distributional Parameters as Inputs in a Sequential Unsupervised and Supervised Modeling of Sunspots Data
AUTHORS:
K. Mwitondi, J. Bugrien, K. Wang
KEYWORDS:
Clustering; Data Mining; Density Estimation; EM Algorithm; Sunspots; Supervised Modelling; Support Vector Machines; Unsupervised Modelling
JOURNAL NAME:
Journal of Software Engineering and Applications,
Vol.6 No.7B,
October
23,
2013
ABSTRACT:
Detecting naturally arising structures
in data is central to knowledge extraction from data. In most applications, the main
challenge is in the choice of the appropriate model for exploring the data
features. The choice is generally poorly understood and any tentative choice
may be too restrictive. Growing
volumes of data, disparate data sources and modelling techniques entail the need for model
optimization via adaptability rather than comparability. We propose a novel
two-stage algorithm to modelling continuous data consisting of an unsupervised stage whereby the algorithm
searches through the data for optimal parameter values and a supervised stage that adapts the parameters for
predictive modelling. The
method is implemented on the sunspots data with inherently Gaussian
distributional properties and assumed bi-modality. Optimal values separating
high from lows cycles are obtained via multiple simulations. Early patterns for
each recorded cycle reveal that the first 3 years provide a sufficient basis
for predicting the peak. Multiple Support Vector Machine runs using repeatedly
improved data parameters show that the approach yields greater accuracy and
reliability than conventional approaches and provides a good basis for model
selection. Model reliability is established via multiple simulations of this
type.