Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

K nearest neighbor approximation query method based on multi-layer locality sensitive hashing

A local sensitive hash and query method technology, applied in the field of data analysis, can solve the problem of uneven LSH bucket size, etc., and achieve the effect of improving kNN search efficiency, uniform size distribution, and obvious advantages

Inactive Publication Date: 2019-11-22
NORTHEASTERN UNIV LIAONING
View PDF0 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0015] Aiming at the problem of uneven size of traditional LSH buckets, the present invention provides a multi-layer Locality-Sensitive Hashing (Locality-Sensitive Hashing) method that supports k Nearest Neighbors (kNN) approximate query, a multi-layer Locality-Sensitive Hashing Hash (LayerLSH) structure and an approximate kNN query method based on this structure, it achieves the effect of adaptively adjusting the size of the buckets by re-dividing the existing hash buckets or merging the existing hash buckets, so that it can effectively Reduce the search cost and improve the accuracy when processing kNN queries with skewed distribution data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • K nearest neighbor approximation query method based on multi-layer locality sensitive hashing
  • K nearest neighbor approximation query method based on multi-layer locality sensitive hashing
  • K nearest neighbor approximation query method based on multi-layer locality sensitive hashing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The specific implementation of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0049] In the method of this embodiment, the software environment is WINDOWS 7 system, and the programming language is Java.

[0050] Step 1: Use the traditional method to construct the LSH index structure. Such as figure 1 As shown, wherein the LSH parameters l=1, m=3, the LSH structure of the 0th layer is the result of the traditional LSH construction, which is the 0th layer hash table (level 0).

[0051] Step 2: Determine the overload bucket and the underload bucket according to the user's expected recall rate α and precision rate β. Determine the upper limit and lower limit of the hash bucket according to the user's expected recall rate α and precision rate β, and then determine the overload bucket and underload bucket.

[0052] Step 3: Carry out hash division on the overloaded bucket with new LSH parameters. Such as figure 1 A...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the field of data analysis, and relates to a k nearest neighbor approximation query method based on multi-layer local sensitive hashing. The method comprises the following steps: firstly evaluating the number of data points mapped to each hash bucket, determining an overload hash bucket and an underload hash bucket according to the number of the data points in each hash bucket, then further hashing and dividing the overload hash bucket into a plurality of sub-buckets, and merging the underload hash buckets at the same time; and recursively performing re-hashing on thesub-buckets which are still overloaded after re-division, and balancing the sizes of the plurality of hash buckets as much as possible after multiple times of re-hashing. Therefore, the LSH index structure becomes a multi-layer tree-like structure. According to the method, the initially constructed LSH hash table is reconstructed, so that the kNN search efficiency of the query points in the denseregion and the kNN search accuracy of the query points in the sparse region are improved. The hash buckets in the multi-layer local sensitive hash structure are relatively uniform in size distribution, and the advantages are very obvious when kNN search is carried out on obliquely distributed big data on the hash buckets.

Description

technical field [0001] The invention belongs to the field of data analysis, and relates to a k-nearest-neighbor similar query method based on multi-layer local sensitive hash. Background technique [0002] kNN search is a commonly used operation in data analysis. Its purpose is to find the k records closest to the query point (that is, the most similar) from the data set. If you want to find the exact kNN result for a given query point, you have to traverse the entire dataset, calculate the distance of each point in the dataset to the query point, and then find the k closest points to the query point, which requires searching the entire dataset. If kNN search is performed on high-dimensional large-scale data sets, the cost of obtaining accurate kNN results is even greater. If it is necessary to feed back the query results to the user in real time, it is obviously not advisable to perform an accurate kNN query. The approximate kNN search technology is proposed to speed up th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/22G06F16/2458
CPCG06F16/2255G06F16/2462
Inventor 张岩峰
Owner NORTHEASTERN UNIV LIAONING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products