Support Vector Machine-Based Fault Diagnosis of Power Transformer Using k Nearest-Neighbor Imputed DGA Dataset


Missing values are prevalent in real-world datasets and they may reduce predictive performance of a learning algorithm. Dissolved Gas Analysis (DGA), one of the most deployable methods for detecting and predicting incipient faults in power transformers is one of the casualties. Thus, this paper proposes filling-in the missing values found in a DGA dataset using the k-nearest neighbor imputation method with two different distance metrics: Euclidean and Cityblock. Thereafter, using these imputed datasets as inputs, this study applies Support Vector Machine (SVM) to built models which are used to classify transformer faults. Experimental results are provided to show the effectiveness of the proposed approach.

Share and Cite:

Sahri, Z. and Yusof, R. (2014) Support Vector Machine-Based Fault Diagnosis of Power Transformer Using k Nearest-Neighbor Imputed DGA Dataset. Journal of Computer and Communications, 2, 22-31. doi: 10.4236/jcc.2014.29004.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] (2009) Guide for the Interpretation of Gases Generated in Oil-Immersed Transformers. IEEE Std C57.104-2008 (Revision of IEEE Std C57.104-1991).
[2] Duval, M. Dissolved Gas Analysis and the Duval Triangle.
[3] Yang, Z., Tang, W.H., Shintemirov, A. and Wu, Q.H. (2009) Association Rule Mining-Based Dissolved Gas Analysis for Fault Diagnosis of Power Transformers. Transactions on Systems, Man, and Cybernetics C: Applied Review, 39, 597-610.
[4] Tang, W.H., Spurgeon, K., Wu, Q.H. and Richardson, Z.J. (2004) An Evidential Reasoning Approach to Transformer Condition Assessment. IEEE Transactions on Power Delivery, 19, 1696-1703.
[5] Hall, M.A. (1999) Correlation-Based Feature Selection for Machine Learning. Ph.D. Thesis, University of Waikato, Waikato.
[6] Duval, M. and de Pabla, A. (2001) Interpretation of Gas-in-Oil Analysis Using New IEC Publication 60599 and IEC TC 10 Databases. IEEE on Electrical Insulation Magazine, 17, 31-41.
[7] Acu?a, E. and Rodriguez, C. (2004) The Treatment of Missing Values and Its Effect on Classifier Accuracy. In: Banks, D., et al., Eds., Classification, Clustering, and Data Mining Applications, Springer, Berlin Heidelberg, 639-647.
[8] Peng, L., Lei, L. and Naijun, W. (2005) A Quantitative Study of the Effect of Missing Data in Classifiers. Proceedings of the Fifth International Conference on Computer and Informa-tion Technology, 21-23 September 2005, 28-33.
[9] García-Laencina, P., Sancho-Gomes, J., Figueiras-Vidal, A. and Verleysen, M. (2009) K-Nearest Neighbours with Mutual Information for Simultaneous Classification and Missing Data Imputa-tion. Neurocomputing, 72, 1483-1493.
[10] Song, Q., Shepperd, M., Chen, X. and Liu, J. (2008) Can k-NN Imputation Improve the Performance of C4.5 with Small Software Project Data Sets? A Comparative Evaluation. Journal of Systems and Software, 81, 2361-2370.
[11] Schafer, J.L. and Graham, J.W. (2002) Missing Data: Our View of the State of the Art. Psychological Methods, 7, 147- 177.
[12] Tsikriktsis, N. (2005) A Review of Techniques for Treating Missing Data in OM Survey Research. Journal of Operations Management, 24, 53-62.
[13] Jerez, J.M., Molina, I., García-Laencina, P.J., Alba, E., Ribelles, N., Martín, M. and Franco, L. (2010) Missing Data Imputation Using Statistical and Machine Learning Methods in a Real Breast Cancer Problem. Artificial Intelligence in Medicine, 50, 105-115.
[14] Shen-Wei, L. and Hsien-Chu, W. (2012) Effective Multiple-Features Extraction for Off-Line SVM-Based Handwritten Numeral Recognition. Proceedings of the International Conference on Information Security and Intelligence Control (ISIC), 14-16 August 2012, 194-197.
[15] Niu, X.-X. and Suen, C.Y. (2012) A Novel Hybrid CNN-SVM Classifier for Recognizing Handwritten Digits. Pattern Recognition, 45, 1318-1325.
[16] Rongbiao, Z., Zhao, S., Jin, Z., Zhenjun, Y., Ning, K. and Kang, H.J. (2010) Application of SVM in the Food Bacteria Image Recognition and Count. Proceedings in the 3rd International Congress in Image and Signal Processing (CISP), 16-18 October 2010, 1819-1823.
[17] Li, W., Ruifeng, L. and Ke, W. (2013) Automatic Facial Expression Recognition Using SVM Based on AAMs. Pro- ceeding of the 5th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), 26-27 August 2013, 330-333.
[18] Dong-Hui, L., Jian-Peng, B. and Xiao-Yun, S. (2008) The Study of Fault Diagnosis Model of DGA for Oil-Immersed Transformer Based on Fuzzy Means Kernel Clustering and SVM Multi-Class Object Simplified Structure. Proceedings of the International Conference on Machine Learning and Cybernetics, 12-15 July 2008, 1505-1509.
[19] Bacha, K., Souahlia, S. and Gossa, M. (2012) Power Transformer Fault Diagnosis Based on Dissolved Gas Analysis by Support Vector Machine. Electric Power Systems Research, 83, 73-79.
[20] Lv, G.Y., Cheng, H.Z., Zhai, H.B. and Dong, L.X. (2005) Fault Di-agnosis of Power Transformer Based on Multi-Layer SVM Classifier. Electric Power Systems Research, 75, 1-7.
[21] Chih-Wei, H. and Chih-Jen, L. (2002) A Comparison of Methods for Multiclass Support Vector Machines. IEEE Transactions on Neural Networks, 13, 415-425.
[22] Allwein, E.L., Schapire, R.E. and Singer, Y. (2001) Reducing Multiclass to Binary: A Unifying Approach for Margin Classifier. Journal of Machinery Learning and Research, 1, 113-141.
[23] (2009) MATLAB, Version 7.9.0 (R2009b). The Math Works Inc.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.