A new data clustering algorithm based on critical distance methodology


Kuwil F. H. , Shaar F., Topcu A. E. , Murtagh F.

EXPERT SYSTEMS WITH APPLICATIONS, cilt.129, ss.296-310, 2019 (SCI İndekslerine Giren Dergi) identifier identifier

  • Cilt numarası: 129
  • Basım Tarihi: 2019
  • Doi Numarası: 10.1016/j.eswa.2019.03.051
  • Dergi Adı: EXPERT SYSTEMS WITH APPLICATIONS
  • Sayfa Sayıları: ss.296-310

Özet

A variety of algorithms have recently emerged in the field of cluster analysis. Consequently, based on the distribution nature of the data, an appropriate algorithm can be chosen for the purpose of clustering. It is difficult for a user to decide a priori which algorithm would be the most appropriate for a given dataset. Algorithms based on graphs provide good results for this task. However, these algorithms are vulnerable to outliers with limited information about edges contained in the tree to split a dataset. Thus, in several fields, the need for better clustering algorithms increases and for this reason utilizing robust and dynamic algorithms to improve and simplify the whole process of data clustering has become an urgent need. In this paper, we propose a novel distance-based clustering algorithm called the critical distance clustering algorithm. This algorithm depends on the Euclidean distance between data points and some basic mathematical statistics operations. The algorithm is simple, robust, and flexible; it works with quantitative data that are real-valued, not qualitative, and categorical with different dimensions. In this work, 26 experiments are conducted using different types of real and synthetic datasets taken from different fields. The results prove that the new algorithm outperforms some popular clustering algorithms such as MST-based clustering, K-means, and Dbscan. Moreover, the algorithm can precisely produce more reasonable clusters even when the dataset contains outliers and without specifying any parameters in advance. It also provides a number of indicators to evaluate the established clusters and prove the validity of the clustering. (C) 2019 Published by Elsevier Ltd.