Similarity search

Recommender systems

Concepts	Collective intelligence Relevance Star ratings Long tail

Methods and challenges	Cold start Collaborative filtering Dimensionality reduction Implicit data collection Item-item collaborative filtering Preference elicitation Similarity search

Implementations	Collaborative search engine Content discovery platform Decision support system Music Genome Project Product finder

Research	GroupLens Research MovieLens Netflix Prize

Similarity search is the most general term used for a range of mechanisms which share the principle of searching (typically, very large) spaces of objects where the only available comparator is the similarity between any pair of objects. This is becoming increasingly important in an age of large information repositories where the objects contained do not possess any natural order, for example large collections of images, sounds and other sophisticated digital objects.

Nearest neighbor search and range queries are important subclasses of similarity search, and a number of solutions exist. Research in Similarity Search is dominated by the inherent problems of searching over complex objects. Such objects cause most known techniques to lose traction over large collections, and there are still many unsolved problems. Unfortunately, in many cases where similarity search is necessary, the objects are inherently complex.

The most general approach to similarity search that allows construction of efficient index structures use the mathematical notion of metric space.

A popular approach for similarity search is locality sensitive hashing – LSH.^[1] hashes input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in nearest neighbor search on large scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases.^[2]

Bibliography

Pei Lee, Laks V. S. Lakshmanan, Jeffrey Xu Yu: On Top-k Structural Similarity Search. ICDE 2012:774-785
Zezula, P., Amato, G., Dohnal, V., and Batko, M. Similarity Search - The Metric Space Approach. Springer, 2006. ISBN 0-387-29146-6
Samet, H.. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, 2006. ISBN 0-12-369446-9
E. Chavez, G. Navarro, R.A. Baeza-Yates, J.L. Marroquin, Searching in metric spaces, ACM Computing Surveys, 2001
M.L. Hetland, The Basic Principles of Metric Indexing, Swarm Intelligence for Multi-objective Problems in Data Mining, Studies in Computational Intelligence Volume 242, 2009, pp 199–232

Resources

References

↑ Gionis, Aristides, Piotr Indyk, and Rajeev Motwani. "Similarity search in high dimensions via hashing." VLDB. Vol. 99. No. 6. 1999.
↑ Rajaraman, A.; Ullman, J. (2010). "Mining of Massive Datasets, Ch. 3.".

External links

This article is issued from Wikipedia - version of the 7/25/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Similarity search

See also

Bibliography

Resources

References

External links