Using Optimized Distributional Parameters as Inputs in a Sequential Unsupervised and Supervised Modeling of Sunspots Data


Detecting naturally arising structures in data is central to knowledge extraction from data. In most applications, the main challenge is in the choice of the appropriate model for exploring the data features. The choice is generally poorly understood and any tentative choice may be too restrictive. Growing volumes of data, disparate data sources and modelling techniques entail the need for model optimization via adaptability rather than comparability. We propose a novel two-stage algorithm to modelling continuous data consisting of an unsupervised stage whereby the algorithm searches through the data for optimal parameter values and a supervised stage that adapts the parameters for predictive modelling. The method is implemented on the sunspots data with inherently Gaussian distributional properties and assumed bi-modality. Optimal values separating high from lows cycles are obtained via multiple simulations. Early patterns for each recorded cycle reveal that the first 3 years provide a sufficient basis for predicting the peak. Multiple Support Vector Machine runs using repeatedly improved data parameters show that the approach yields greater accuracy and reliability than conventional approaches and provides a good basis for model selection. Model reliability is established via multiple simulations of this type.

Share and Cite:

K. Mwitondi, J. Bugrien and K. Wang, "Using Optimized Distributional Parameters as Inputs in a Sequential Unsupervised and Supervised Modeling of Sunspots Data," Journal of Software Engineering and Applications, Vol. 6 No. 7B, 2013, pp. 34-41. doi: 10.4236/jsea.2013.67B007.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] J. Bugrien, K. Mwitondi and F. Shuweihdi (2013). A Kernel Density Smoothing Method for Determining an Optimal Number of Clusters in Continuous Data; The 16th International Conference on Computational Methods and Experimental Measurements; 2 - 4 July, 2013, A Coru?a, Spain.
[2] A. R. Choudhuri, P. Chatterjee and J. Jiang (2007). Predicting Solar Cycle 24 with a Solar Dynamo Model; Physical Review Letters, Vol. 98, No. 13, American Phys. Society.
[3] A. Cuevas, M. Febrero and R. Fraiman, “Estimating the Number of Clusters,” The Canadian Journal of Statistics, Vol. 28, No. 2, pp. 367-382. doi:10.2307/3315985
[4] Cortes and Vapnik, “Support-vector Networks; Machine Learning,” Vol. 20, No. 3, pp. 273-297, Kluwer Academic Publishers. doi:10.1007/BF00994018
[5] A. P. Dempster, N. M. Laird and D. B. Rubin, “Maximum Like-lihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Vol. 39, 1977, pp. 1-38.
[6] D. Hand, H. Mannila and P. Smyth, Principles of Data Mining (Adaptive Computation and Machine Learning); A Bradford Book; ISBN-13: 978-0262082907.
[7] R. P. Kane, “Solar Cycle Predictions Based on Extrapolation of Spectral Components: An Update,” A Journal for Solar and Solar-Stellar Research and the Study of Solar Terrestrial Physics, Vol. 246, No. 2, 2007, pp. 487-493.
[8] Kitiashvili, I. and Kosovichev, A. (2009). Prediction of solar magnetic cycles by a data assimilation method; Cosmic Magnetic Fields: From Planets, to Stars and Galaxies; Proceedings IAU Symposium, No. 259, Edited by Strassmeier, K, Kosovichev, A. and Beckman, J. (2009) - International Astronomical Union.
[9] G. McLachlan T. Krishnan, (1996). The EM Algorithm and Extensions; John Wiley.
[10] K. Mwitondi, R. Said and A. Yousif, “A Sequential Data Mining Method for Modelling Solar Magnetic Cycles,” Neural Information Processing, LNCS, Vol. 7663, pp 296-304, Springer 2012.
[11] NOOA (2012).
[12] E. Pohtila, (1980). Climatic Fluctuations and Forestry in Lapland; Ecography, Vol. 3, No. 2, pp 65-136, ISSN: 1600-0587.
[13] R. Pielke, R. Avissar, M. Raupach, A. Dolman, X. Zeng and A. Denning, (1998). Interactions between the atmosphere and terrestrial ecosystems: Influence on weather and climate; Global Change Biology, Vol. 4, No. 5, pp. 461-475.
[14] R. Qahwaji and T. Colak, (2007). Automatic Short-Term Solar Flare Prediction Using Machine Learning and Sunspot Associations; SOLAR PHYSICS, Vol. 241, No. 1, pp. 195-211.
[15] R (2011). R Version 2.13.0 for Windows; R Foundation for Statistical Computing.
[16] D. Reames, Magnetic topology of impulsive and gradual solar energetic particle events; The Astrophysical Journal, Vol. 571, 2002, pp 63-66. doi:10.1086/341149
[17] S. J. Roberts, “Parametric and Non-parametric Unsupervised Cluster Analysis,” Pattern Recognition, Vol. 30, No. 5, 1997, pp. 261-272. doi:10.1016/S0031-3203(96)00079-9
[18] M. J. Rycroft, S. Israelsson and C. Price, “The Global Atmospheric Electric Circuit, Solar Activity and Climate Change,” Journal of Atmospheric and Solar-Terrestrial Physics, Vol. 62, No. 17-18, 2000, pp. 1563-1576.
[19] S. H. Schwabe,. AstronomischeNachrich-ten, Vol. 20, No. 495, 1843, pp. 234-235.
[20] G. L. Siscoe, Solar–terrestrial Influences on Weather and Climate; Climatology Supplement, Nature, Vol. 276, pp. 348-352.
[21] B. W. Silverman, Using Kernel Density Estimates to Investigate Multimodality, Journal of the Royal Statistical Society, B, 43, 1981, pp 97-99.
[22] J. R. Wolf, New studies of the period of Suns-pots and their meanings; Communications of Natural History; Society in Bern, 255,1852, pp 249-270.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.