Big data refers to any assortment of data which are outsized and intricate in nature such that conventional database administration systems and data processing tools cannot process. Feature selection serves primarily to reduce the processing burden of data mining models. To expedite the processing of large volumes of data, parallel processing is implemented using the MapReduce (MR) technique. MapReduce model is applied to big datasets, which is further divided into smaller partition. However, existing algorithms often fall short in enhancing classifier performance significantly. This research advocates for the use of the MR method to conduct feature selection in parallel, thereby improving performance. Additionally, to augment classifier efficacy, this study introduces an innovative approach combining Online Feature Selection (OFS) with an Accelerated Bat Algorithm (ABA) within a framework that pre-processes features ahead of time, without prior knowledge of the feature space. The proposed OFS-ABA method is designed to select relevant and non-redundant features within the MapReduce (MR) framework. Furthermore, an Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) classifier is employed to classify dataset samples. The outputs from homogeneous IDMLP classifiers are aggregated using the EIDMPL classifier. The proposed feature selection method and classifier are extensively evaluated across three high-dimensional datasets. The results indicate that the MR-OFS-ABA method outperforms existing feature selection methods such as PSO, APSO, and ASAMO (Accelerated Simulated Annealing and Mutation Operator). Additionally, the performance of the EIDMLP classifier is compared with other existing classifiers including Naïve Bayes (NB), Hoeffding Tree (HT), and Fuzzy Minimal Consistent Class Subset Coverage (FMCCSC)-KNN (K Nearest Neighbor). The methodology is applied to three datasets, and the results are compared across four classifiers and three state-of-the-art feature selection algorithms. Overall, the findings of this research demonstrate improved accuracy and reduced processing time.
Author(s) Details:
Renuka Devi D,
Department of Computer Science Stella Maris College (Autonomous), Chennai, Tamil Nadu, India.
Swetha Margaret TA,
Department of Computer Science Stella Maris College (Autonomous), Chennai, Tamil Nadu, India.
Please see the link here: https://stm.bookpi.org/RUMCS-V2/article/view/13960
No comments:
Post a Comment