Journal of Data Analysis and Information Processing

Volume 2, Issue 4 (November 2014)

ISSN Print: 2327-7211   ISSN Online: 2327-7203

Google-based Impact Factor: 1.59  Citations  

A Feature Subset Selection Technique for High Dimensional Data Using Symmetric Uncertainty

HTML  XML Download Download as PDF (Size: 3227KB)  PP. 95-105  
DOI: 10.4236/jdaip.2014.24012    8,274 Downloads   11,859 Views  Citations

ABSTRACT

With the abundance of exceptionally High Dimensional data, feature selection has become an essential element in the Data Mining process. In this paper, we investigate the problem of efficient feature selection for classification on High Dimensional datasets. We present a novel filter based approach for feature selection that sorts out the features based on a score and then we measure the performance of four different Data Mining classification algorithms on the resulting data. In the proposed approach, we partition the sorted feature and search the important feature in forward manner as well as in reversed manner, while starting from first and last feature simultaneously in the sorted list. The proposed approach is highly scalable and effective as it parallelizes over both attribute and tuples simultaneously allowing us to evaluate many of potential features for High Dimensional datasets. The newly proposed framework for feature selection is experimentally shown to be very valuable with real and synthetic High Dimensional datasets which improve the precision of selected features. We have also tested it to measure classification accuracy against various feature selection process.

Share and Cite:

Singh, B. , Kushwaha, N. and Vyas, O. (2014) A Feature Subset Selection Technique for High Dimensional Data Using Symmetric Uncertainty. Journal of Data Analysis and Information Processing, 2, 95-105. doi: 10.4236/jdaip.2014.24012.

Cited by

[1] Machine learning-based diagnosis and risk factor analysis of cardiocerebrovascular disease based on KNHANES
Scientific reports, 2022
[2] A graph based preordonnances theoretic supervised feature selection in high dimensional data
Knowledge-Based Systems, 2022
[3] Ensemble methods for meningitis aetiology diagnosis
Expert Systems, 2022
[4] Ranking of CMIP6 based High-resolution Global Climate Models for India using TOPSIS
ISH Journal of Hydraulic Engineering, 2022
[5] Missing data imputation on biomedical data using deeply learned clustering and L2 regularized regression based on symmetric uncertainty
Artificial Intelligence in Medicine, 2022
[6] Application of active learning in DNA microarray data for cancerous gene identification
Expert Systems with …, 2021
[7] LogNADS: Network anomaly detection scheme based on semantic representation
2021
[8] A distinguishing profile of chemokines, cytokines and biomarkers in the saliva of children with Sjögren's syndrome.
Rheumatology …, 2021
[9] Ensemble-based Feature Selection using Symmetric Uncertainty and SVM classification
2021 2nd Global …, 2021
[10] Pearson's Redundancy Multi-Filtering with BAT Algorithm for Selecting High Dimensional Imbalanced Features
2021
[11] A New Feature Selection Method Based on Class Association Rule
2021
[12] A distinguishing profile of chemokines, cytokines and biomarkers in the saliva of children with Sjögren's syndrome
2021
[13] Dataset-chemokines, cytokines, and biomarkers in the saliva of children with Sjögren's syndrome
2021
[14] A two-stage hybrid ant colony optimization for high-dimensional feature selection
2021
[15] Ensemble based filter feature selection with harmonize particle swarm optimization and support vector machine for optimal cancer classification
2021
[16] An N-gram Based Deep Learning Method for Network Traffic Classification
2021
[17] Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data
2021
[18] Evaluation of Clustering Results in the Aspect of Information Theory
2020
[19] Extraction with map-reduce framework and correlation-based feature selection in lung cancer towards big data
2020
[20] Feature Representation Learning in complex water decision making problems
2020
[21] Selection of CMIP5 GCM ensemble for the projection of Spatio-temporal changes in precipitation and temperature over the Niger Delta, Nigeria
2020
[22] Multivariate fault detection and diagnosis based on variable grouping
2020
[23] Feature Selection: It Importance in Performance Prediction
2020
[24] Predicted Climate Change Impact on Groundwater Flow for the Upper Zone of Iraqi Aquifers
2020
[25] Inferring Quality of Experience for Adaptive Video Streaming over HTTPS and QUIC
2020
[26] Selection of suitable precipitation CMIP-5 sets of GCMs for Iraq using a symmetrical uncertainty filter
2020
[27] STUDY ON FEATURE SELECTION AND FEATURE EXTRACTION TECHNIQUES IN DATA MINING
2020
[28] Feature-Selection-Based Transfer Learning for Intracortical Brain–Machine Interface Decoding
2020
[29] A unique profile of chemokines, cytokines, and biomarkers in the saliva of children with Sjögren syndrome
2020
[30] Precipitation projection using a CMIP5 GCM ensemble model: a regional investigation of Syria
2020
[31] Feature Selection using Genetic Programming
2019
[32] DQPFS: Distributed quadratic programming based feature selection for big data
2019
[33] Hybrid Feature Selection Algorithm to Support Health Data Warehousing
2019
[34] A Novel LtR and RtL Framework for Subset Feature Selection (Reduction) for Improving the Classification
2019
[35] Feature Selection Method Based On Statistics of Compound Words for Arabic Text Classification
The International Arab Journal of Information Technology, 2019
[36] A Novel LtR and RtL Framework for Subset Feature Selection (Reduction) for Improving the Classification Accuracy
2019
[37] Research on classification method of high-dimensional class-imbalanced datasets based on SVM
2019
[38] Symmetrical uncertainty-based feature subset generation and ensemble learning for electricity customer classification
2019
[39] Approaches for Semantic Relatedness Computation for Big Data
2019
[40] Comparison of four learning-based methods for predicting groundwater redox status
2019
[41] Leveraging Socioeconomic Information and Deep Learning for Residential Load Pattern Prediction
2019
[42] A machine learning model for predicting ICU readmissions and key risk factors: analysis from a longitudinal health records
2019
[43] A Hybrid Scheme for Feature Selection of High Dimensional Educational Data
2019
[44] Feature selection method based on statistics of compound words for arabic text classification.
2019
[45] Rough Set-Based Feature Subset Selection Technique Using Jaccard's Similarity Index
Proceedings of International Ethical Hacking Conference 2018, 2019
[46] PROJECTION AND PREDICTION OF HEAT WAVES FOR AN ARID REGION IN THE CONTEXT OF CLIMATE CHANGE
2019
[47] Classifying Combined MicroRNA Data Sets
2019
[48] Windows malware detector using convolutional neural network based on visualization images
2019
[49] An Unsupervised Approach for Selection of Candidate Feature Set Using Filter Based Techniques.
2018
[50] An Unsupervised Approach For Selection of Candidate Feature Set Using Filter Based Techniques
Gazi University Journal of Science, 2018
[51] Performance assessment of general circulation model in simulating daily precipitation and temperature using multiple gridded datasets
2018
[52] Rough Set Based Feature Subset Selection Technique using Jaccard's Similarity Index
2018
[53] A novel synergistic fibroblast optimization based Kalman estimation model for forecasting time-series data
Evolving Systems, 2018
[54] An Optimal Multi-Level Backward Feature Subset Selection for Object Recognition
IETE Journal of Research, 2018
[55] A Hybrid Efficient Feature Selection Model for High Dimensional Data Set based on KNHNAES (2013~ 2015)
Journal of Digital Contents Society, 2018
[56] A Feature Selection Approach for Enhancing the Cardiotocography Classification Performance
International Journal of Engineering and Techniques, 2018
[57] Performance Evaluation of Filter-based Feature Selection Techniques in Classifying Portable Executable Files
Procedia Computer Science, 2018
[58] EARLY PROGNOSIS OF BREAST CANCER USING IMAGE PROCESSING AND MACHINE LEARNING
2018
[59] Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection
2017
[60] Feng Si, Baojun Lin and Shancong Zhang
2017
[61] Research on classification method of high-dimensional class-imbalanced data sets based on SVM
2017
[62] Overview of feature subset selection algorithm for high dimensional data
2017
[63] Hybridized Combinational Feature Selection Framework for Network Intrusion Detection System (HCFSF)
Indian Journal of Computer Science and Engineering, 2017
[64] A proposed framework on hybrid feature selection techniques for handling high dimensional educational data
AIP Conference Proceedings, 2017
[65] 'Big data analytics' for construction firms insolvency prediction models
2017
[66] Intelligent Systems Approach for Classification and Management of Patients with Headache
2017
[67] Handling High Dimensional Educational Data using Feature Selection Techniques
2017
[68] Applying improved svm classifier for leukemia cancer classification using FCBF
2017
[69] ОПТИМАЛЬНАЯ ЭНТРОПИЙНАЯ КЛАСТЕРИЗАЦИЯ В ИНФОРМАЦИОННЫХ СИСТЕМАХ
2017
[70] From Image to Information: Image Processing in Dermatology and Cutaneous Biology
Imaging in Dermatology, 2016
[71] Feature selection in clinical data processing for classification
2016
[72] STUDY AND DEVELOPMENT OF NOVEL FEATURE SELECTION FRAMEWORK FOR EFFECTIVE DATA CLASSIFICATION
2015
[73] Feature Selection for Classification of Abnormalities in Medical Images–A
[74] Emotion recognition process analysis by using eye tracker, sensor and application log data

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.