|
''k''-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. ''k''-means clustering aims to partition ''n'' observations into ''k'' clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, ''k''-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes. The algorithm has a loose relationship to the ''k''-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with ''k''-means because of the ''k'' in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by ''k''-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm. == Description == Given a set of observations (x1, x2, …, x''n''), where each observation is a ''d''-dimensional real vector, ''k''-means clustering aims to partition the ''n'' observations into ''k'' (≤ ''n'') sets S = so as to minimize the within-cluster sum of squares (WCSS) (sum of distance functions of each point in the cluster to the K center). In other words, its objective is to find: where ''μ''''i'' is the mean of points in ''S''''i''. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「K-means clustering」の詳細全文を読む スポンサード リンク
|