Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method and device for establishing an index

An index building and indexing technology, applied in text database indexing, unstructured text data retrieval, special data processing applications, etc., can solve the problem of inability to compare the similarity of signatures to determine the similarity of the original content, reduce storage indexes, improve The effect of retrieval speed

Active Publication Date: 2020-07-03
RUN TECH CO LTD BEIJING
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the index technology based on the traditional hash function cannot determine the similarity between the original content by comparing the similarity of signatures, which has certain limitations.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for establishing an index
  • A method and device for establishing an index
  • A method and device for establishing an index

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0049] figure 1 It is a schematic flowchart of an index building method provided in Embodiment 1 of the present invention. The index building method provided in this embodiment is applicable to building an index for a large amount of text data, and the method can be executed by an index building device. see figure 1 As shown, the method specifically includes the following:

[0050] Step 110, extracting feature words of the target text.

[0051]Specifically, extracting the feature words of the target text can be based on Chinese word segmentation, through text segmentation and word frequency sorting during word segmentation, and can further rely on text semantic analysis and part-of-speech tuning to find word segmentation that can accurately reflect the meaning of the text. Words that can accurately reflect the meaning of the text are used as feature words. Further according to the preset strategy, the characteristic words are sorted to obtain the characteristic character st...

Embodiment 2

[0072] figure 2 It is a schematic flow chart of an index establishment method provided by Embodiment 2 of the present invention. On the basis of the technical solution of Embodiment 1, this embodiment adds a recommendation operation for similar texts, and performs similar text based on the index established by the method disclosed in Embodiment 1. Text recommendation can achieve high similar text recommendation efficiency and accuracy. For details, see figure 2 As shown, the method includes:

[0073] Step 210, extracting feature words of the target text.

[0074] Step 220, sort the feature words to obtain a feature string.

[0075] Step 230: Apply the MinHash algorithm to the feature string to obtain the hash value corresponding to the target text.

[0076] Step 240: If there is an index mapping bucket matching the hash value in the mapping buffer pool, establish an index between the hash value and the target text in the index mapping bucket, and then match the index map...

Embodiment 3

[0095] Figure 4 It is a schematic structural diagram of an index establishment device provided in Embodiment 3 of the present invention, see Figure 4 As shown, the device includes: a feature word extraction module 410, a sorting module 420, a first computing module 430, a first building module 440 and a second building module 450;

[0096] Wherein, the feature word extraction module 410 is used to extract the feature words of the target text; the sorting module 420 is used to sort the feature words to obtain a feature string; the first operation module 430 is used to apply the feature string to the feature string The MinHash algorithm is used to obtain the hash value corresponding to the target text; the first building module 440 is used to find whether there is an index mapping bucket matching the hash value in the mapping buffer pool, and if it exists, in the index mapping Establish an index between the hash value and the target text in the bucket; the second establishmen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

An embodiment of the invention discloses an index creating method and device. The method includes: extracting feature words of a target text; sorting the feature words to obtain a feature character string; applying the MinHash algorithm to the feature character string to obtain a hash value corresponding to the target text; searching a mapping buffer pool to determine whether an index mapping barrel matched with the hash value exists or not, if yes, creating an index between the hash value and the target text in the index mapping barrel, and if not, establishing the index mapping barrel matched with the hash value, and creating the index between the hash value and the target text. By adoption of the index creating method, index storage quantity is decreased; by creation of indexes of similar texts in the same index mapping barrel, classification of the similar texts is realized, and similar text retrieval speed is increased.

Description

technical field [0001] Embodiments of the present invention relate to the field of information indexing and query, and in particular, to an index establishment method and device. Background technique [0002] In recent years, with the rapid development and popularization of Internet technology, in many cases, I need to quickly and accurately find the data we want from massive data. This process is called similarity search. [0003] With the rapid increase of network data, search speed has become a major bottleneck for similarity search. Therefore, how to design a fast and effective index structure has become an urgent need for similarity search in the era of big data. One of the currently commonly used indexing technologies is an index based on a tree structure, typically a KD tree. The tree structure index adopts the structural design of subspace division. By dividing the object data into several subspaces, each subspace contains similar data. When searching, only search w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/31
CPCG06F16/325
Inventor 谢永恒张侠火一莽万月亮
Owner RUN TECH CO LTD BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products