Hash coding method and device

A hash coding and coding technology, applied in database indexing, structured data retrieval, etc., can solve problems such as inability to distinguish data well, affecting large-scale data indexing and neighbor query effects, and achieve the effect of maintaining the neighbor structure.

Active Publication Date: 2019-03-19
FUJITSU LTD
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Single-threshold quantization methods often cannot distinguish data well, which makes adjacent data may have different encodings, which in turn affects the effect of large-scale data indexing and neighbor query based on hash coding

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hash coding method and device
  • Hash coding method and device
  • Hash coding method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] An embodiment of the present invention provides a hash coding method. figure 1 is the processing flowchart of this method, please refer to figure 1 , the method includes:

[0043] Step 101: Generate projections, that is, generate multiple projection directions based on a given training data set;

[0044] Step 102: Generate cluster centers, that is, project all the training data in the training data set in each of the projection directions to obtain a set of projection values ​​corresponding to each projection direction, and use the preset clustering The class algorithm clusters the projection values ​​corresponding to each projection direction to obtain a predetermined number of cluster centers corresponding to each projection direction;

[0045] Step 103: Determine the threshold, that is, according to the predetermined number of cluster centers corresponding to each projection direction, determine multiple thresholds corresponding to each projection direction accordi...

Embodiment 2

[0083] The embodiment of the present invention also provides a hash coding device. Since the problem-solving principle of the device is similar to the method in embodiment 1, its specific implementation can refer to the implementation of the method in embodiment 1, and the same content is no longer Repeat instructions.

[0084] Figure 8 is a schematic diagram of the composition of the hash coding device, such as Figure 8 As shown, the device includes: a projection unit 801, a clustering unit 802, a threshold determination unit 803, and an encoding unit 804, wherein:

[0085] The projection unit 801 generates multiple projection directions based on a given training data set.

[0086] The clustering unit 802 projects all the training data in the training data set on each of the projection directions to obtain a set of projection values ​​corresponding to each projection direction, and utilizes a preset clustering algorithm for each projection The projection values ​​corresp...

Embodiment 3

[0102] The embodiment of the present invention also provides a hash coding device, which is different from the hash coding device in embodiment 2 in that in this embodiment, the threshold determination unit determines the number of values ​​corresponding to each projection direction by means of linear weighting. a threshold. Wherein, the same content as that of Embodiment 2 will not be described repeatedly.

[0103] Figure 9 is a schematic diagram of the composition of the hash coding device in this embodiment, such as Figure 9 As shown, the hash coding device includes: a projection unit 901, a clustering unit 902, a threshold determination unit 903, and an encoding unit 904, wherein:

[0104] The projection unit 901 generates multiple projection directions based on a given training data set.

[0105] The clustering unit 902 projects all the training data in the training data set on each of the projection directions to obtain a set of projection values ​​corresponding to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embodiment of the invention provides a Hash coding method and device. The method comprises steps as follows: firstly, data are projected through multiple projections, multidimensional data projection values are obtained, data in each dimension are subjected to k-mean clustering, and k cluster centers are obtained; supposed that every two adjacent cluster centers have one threshold value, and (k-1) threshold values are acquired according to the principle of entropy maximization; the data in each dimension are coded according to the threshold values and a preset coding scheme, and finally, codes under all projections are spliced and final binary codes are obtained. Multiple threshold values are applied in each projection direction, so that the defect that the data cannot be effectively divided during single threshold value quantization is overcome and multiple random threshold values can be selected and utilized; multiple self-adaptive learning threshold values are utilized to quantize the data in each dimension, accordingly, a nearest neighboring structure is effectively kept, and a good foundation is provided for large-scale data index and nearest neighbor query.

Description

technical field [0001] The invention relates to the field of data retrieval, in particular to a hash coding method and device. Background technique [0002] With the explosive growth of data on the Internet, such as text, images, and videos, indexing and nearest neighbor queries on large-scale data have attracted more and more attention. [0003] Hash coding is a commonly used technique for converting any real-valued multidimensional data into a 0-1 binary string. It has the advantages of low storage and high-speed query, so it is very suitable for large-scale data indexing and searching. [0004] The traditional hash coding technology first generates several projections, and then performs single-threshold quantization on the projection data in each projection direction to obtain a 0-1 binary code string. Single-threshold quantization methods often cannot distinguish data well, which makes adjacent data may have different encodings, which in turn affects the effect of larg...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/22
Inventor 刘汝杰刘曦
Owner FUJITSU LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products