Background: K-Means is a popular partition-based clustering technique that divides an input dataset into a collection of groups. K-Means is a popular technique because of its simplicity and quickness in grouping large amounts of data. Because of the massive quantity of electronic data generated, data clustering techniques have had to be modified in order to process it. When dealing with massive data, we may improve the performance of K-Means by using a distributed computing environment. The MapReduce paradigm may be used with K-Means to create a distributed computing environment and improve time efficiency. The number of clusters, 'K,' must be pre-specified as an input to the algorithm in order for K-Means to work. This advance calculation and definition of cluster number generally leads to "forced" clustering of data in the lack of sufficient domain expertise, or for a new and unknown dataset, and correct clustering does not emerge.
Method: In this
research, we provide a novel K-Means-based method that accepts just a numerical
dataset as input and produces the required number of clusters on the fly using
the MapReduce programming style.
Author(S) Details
Anupama Chadha
Manav Rachna International Institute of Research and Studies, Faridabad, India.
View Book:- https://stm.bookpi.org/NRAMCS-V3/article/view/6811
No comments:
Post a Comment