Saturday, 21 May 2022

An Efficient K-means Algorithm: Generating Clusters Dynamically in MapReduce Framework | Chapter 04 | Novel Research Aspects in Mathematical and Computer Science Vol. 3

 Background: K-Means is a popular partition-based clustering technique that divides an input dataset into a collection of groups. K-Means is a popular technique because of its simplicity and quickness in grouping large amounts of data. Because of the massive quantity of electronic data generated, data clustering techniques have had to be modified in order to process it. When dealing with massive data, we may improve the performance of K-Means by using a distributed computing environment. The MapReduce paradigm may be used with K-Means to create a distributed computing environment and improve time efficiency. The number of clusters, 'K,' must be pre-specified as an input to the algorithm in order for K-Means to work. This advance calculation and definition of cluster number generally leads to "forced" clustering of data in the lack of sufficient domain expertise, or for a new and unknown dataset, and correct clustering does not emerge.

Method: In this research, we provide a novel K-Means-based method that accepts just a numerical dataset as input and produces the required number of clusters on the fly using the MapReduce programming style.

Findings: Using the MapReduce architecture, the proposed approach not only overcomes the constraint of supplying the value of K initially, but also decreases the calculation time.

Author(S) Details

Anupama Chadha
Manav Rachna International Institute of Research and Studies, Faridabad, India.

View Book:- https://stm.bookpi.org/NRAMCS-V3/article/view/6811

No comments:

Post a Comment