|
Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996. It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature.〔() Most cited data mining articles according to Microsoft academic search; DBSCAN is on rank 24, when accessed on: 4/18/2010〕 In 2014, the algorithm was awarded the test of time award (an award given to algorithms which have received substantial attention in theory and practice) at the leading data mining conference, KDD. ==Preliminary== Consider a set of points in some space to be clustered. For the purpose of DBSCAN clustering, the points are classified as ''core points'', (''density''-)''reachable points'' and ''outliers'', as follows: * A point is a core point if at least points are within distance of it, and those points are said to be ''directly reachable'' from . No points are reachable from a non-core point. * A point is reachable from if there is a path with and , where each is directly reachable from (so all the points on the path must be core points, with the possible exception of ). * All points not reachable from any other point are outliers. Now if is a core point, then it forms a ''cluster'' together with all points (core or non-core) that are reachable from it. Each cluster contains at least one core point; non-core points can be part of a cluster, but they form its "edge", since they cannot be used to reach more points. Reachability is not a symmetric relation since, by definition, no point may be reachable from a non-core point, regardless of distance (so a non-core point may be reachable, but nothing can be reached from it). Therefore a further notion of ''connectedness'' is needed to formally define the extent of the clusters found by DBSCAN. Two points and are density-connected if there is a point such that both and are density-reachable from . Density-connectedness ''is'' symmetric. A cluster then satisfies two properties: # All points within the cluster are mutually density-connected. # If a point is density-reachable from any point of the cluster, it is part of the cluster as well. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「DBSCAN」の詳細全文を読む スポンサード リンク
|