翻訳と辞書
Words near each other
・ Mini survival kit
・ Mini Swirlz
・ Mini Ton class
・ Mini Tour (David Bowie)
・ Mini Transat 6.50
・ Mini Truck
・ Mini Viva
・ Mini Wildgoose
・ Mini World
・ Mini World (Indila album)
・ Mini World (Japanese magazine)
・ Mini World Futsal Club Tournament
・ Mini Xplus
・ Mini Zoo & Black Buck Breeding Centre, Pipli
・ Mini's First Time
MinHash
・ Minhaz Merchant
・ Minhaz Uddin Ahmed
・ Minhe
・ Minhe Formation
・ Minhe Hui and Tu Autonomous County
・ Minheim
・ Minhla
・ Minhla Minkhaung Kyaw
・ Minhla Township
・ Minhla, Bago
・ Minhla, Magway
・ Minhlange
・ Minho
・ Minho (river)


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

MinHash : ウィキペディア英語版
MinHash
In computer science, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The scheme was invented by ,〔 and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results.〔.〕
It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words.〔.〕
==Jaccard similarity and minimum hash values==
The Jaccard similarity coefficient is a commonly used indicator of the similarity between two sets. For sets and it is defined to be the ratio of the number of elements of their intersection and the number of elements of their union:
: J(A,B) = .
This value is 0 when the two sets are disjoint, 1 when they are equal, and strictly between 0 and 1 otherwise. Two sets are more similar (i.e. have relatively more members in common) when their Jaccard index is closer to 1. The goal of MinHash is to estimate quickly, without explicitly computing the intersection and union.
Let be a hash function that maps the members of and to distinct integers, and for any set define to be the minimal member of with respect to —that is, the member of with the minimum value of . Now, if we apply to both and , we will get the same value exactly when the element of the union with minimum hash value lies in the intersection . The probability of this being true is the ratio above, and therefore:
:
That is, the probability that is true is equal to the similarity , assuming randomly chosen sets and . In other words, if is the random variable that is one when and zero otherwise, then is an unbiased estimator of . has too high a variance to be a useful estimator for the Jaccard similarity on its own—it is always zero or one. The idea of the MinHash scheme is to reduce this variance by averaging together several variables constructed in the same way.

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「MinHash」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.