Enhanced Binary Cuckoo Search With Frequent Values and Rough Set Theory for Feature Selection
Publication Type
Original research
Authors
Fulltext
Download

Redundant and irrelevant features in datasets decrease classification accuracy, and increase computational time of classification algorithms, overfitting problem and complexity of the underlying classification model. Feature selection is a preprocessing technique used in classification algorithms to improve the selection of relevant features. Several approaches that combine Rough Set Theory (RST) with Nature Inspired Algorithms (NIAs) have been used successfully for feature selection. However, due to the inherit limitations of RST for some data types and the inefficient convergence of NIAs for high dimensional datasets, these approaches have mainly focused on a specific type of low dimensional nominal dataset. This paper proposes a new filter feature selection approach based on Binary Cuckoo Search (BCS) and RST, which is more efficient for low and high dimensional nominal, mixed and numerical datasets. It enhances BCS by developing a new initialization and global update mechanisms to increase the efficiency of convergence for high dimensional datasets. It also develops a more efficient objective function for numerical, mixed and nominal datasets. The proposed approach was validated on 16 benchmark datasets; 4 nominal, 4 mixed and 8 numerical drawn from the UCI repository. It was also evaluated against standard BCS; five NIAs with fuzzy RST approaches; two popular traditional FS approaches; and multi objective evolutionary, Genetic, and Particle Swarm Optimization (PSO) algorithms. Decision tree and naive Bayes algorithms were used to measure the classification performance of the proposed approach. The results show that the proposed approach achieved improved classification accuracy while minimizing the number of features compared to other state-of-the-art methods. The code is available at https://github.com/abualia4/EBCS.

Journal
Title
IEEE Access
Publisher
IEEE
Publisher Country
United States of America
Indexing
Thomson Reuters
Impact Factor
3.367
Publication Type
Online only
Volume
9
Year
2021
Pages
119430 - 119453