A method and device for establishing an index
An index building and indexing technology, applied in text database indexing, unstructured text data retrieval, special data processing applications, etc., can solve the problem of inability to compare the similarity of signatures to determine the similarity of the original content, reduce storage indexes, improve The effect of retrieval speed
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0049] figure 1 It is a schematic flowchart of an index building method provided in Embodiment 1 of the present invention. The index building method provided in this embodiment is applicable to building an index for a large amount of text data, and the method can be executed by an index building device. see figure 1 As shown, the method specifically includes the following:
[0050] Step 110, extracting feature words of the target text.
[0051]Specifically, extracting the feature words of the target text can be based on Chinese word segmentation, through text segmentation and word frequency sorting during word segmentation, and can further rely on text semantic analysis and part-of-speech tuning to find word segmentation that can accurately reflect the meaning of the text. Words that can accurately reflect the meaning of the text are used as feature words. Further according to the preset strategy, the characteristic words are sorted to obtain the characteristic character st...
Embodiment 2
[0072] figure 2 It is a schematic flow chart of an index establishment method provided by Embodiment 2 of the present invention. On the basis of the technical solution of Embodiment 1, this embodiment adds a recommendation operation for similar texts, and performs similar text based on the index established by the method disclosed in Embodiment 1. Text recommendation can achieve high similar text recommendation efficiency and accuracy. For details, see figure 2 As shown, the method includes:
[0073] Step 210, extracting feature words of the target text.
[0074] Step 220, sort the feature words to obtain a feature string.
[0075] Step 230: Apply the MinHash algorithm to the feature string to obtain the hash value corresponding to the target text.
[0076] Step 240: If there is an index mapping bucket matching the hash value in the mapping buffer pool, establish an index between the hash value and the target text in the index mapping bucket, and then match the index map...
Embodiment 3
[0095] Figure 4 It is a schematic structural diagram of an index establishment device provided in Embodiment 3 of the present invention, see Figure 4 As shown, the device includes: a feature word extraction module 410, a sorting module 420, a first computing module 430, a first building module 440 and a second building module 450;
[0096] Wherein, the feature word extraction module 410 is used to extract the feature words of the target text; the sorting module 420 is used to sort the feature words to obtain a feature string; the first operation module 430 is used to apply the feature string to the feature string The MinHash algorithm is used to obtain the hash value corresponding to the target text; the first building module 440 is used to find whether there is an index mapping bucket matching the hash value in the mapping buffer pool, and if it exists, in the index mapping Establish an index between the hash value and the target text in the bucket; the second establishmen...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com