Large-scale keyword multimode matching method, device and equipment

A multi-modal matching and keyword technology, applied in the field of multi-modal matching, can solve the problem of low keyword retrieval efficiency

Inactive Publication Date: 2019-04-02
南京中孚信息技术有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the object of the present invention is to provide a large-scale keyword multi-mode matching method, device and equipment to solve the technical problem of low retrieval efficiency of keywords in multi-mode matching existing in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale keyword multimode matching method, device and equipment
  • Large-scale keyword multimode matching method, device and equipment
  • Large-scale keyword multimode matching method, device and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0028] A large-scale keyword multi-mode matching method provided by the embodiment of the present invention, such as figure 1 shown, including:

[0029] Step S102, scanning the text to be tested according to a preset state machine, wherein the preset state machine is generated by a preset keyword set.

[0030] In addition, before scanning the text to be tested according to the preset state machine, it also includes the configuration method of the preset state machine, such as figure 2 shown, including:

[0031] Step S202, generating a preset state machine according to the preset keyword set, and setting a unique identification value less than or equal to the total number of keywords for each keyword in the preset keyword set.

[0032] Since there may be a large number of repeated keywords in the original keyword set, it is necessary to remove the repeated keywords in the original keyword set to obtain the preset keyword set. Assuming that the number of keywords in the prese...

Embodiment 2

[0068] An embodiment of the present invention provides a large-scale keyword multi-mode matching device, such as image 3 As shown, it includes: a scanning module 30 , a recording module 31 and an output module 32 .

[0069] Specifically, the scanning module 30 is used to scan the text to be tested according to the preset state machine, wherein the preset state machine is generated by a preset keyword set; the recording module 31 is used to synthesize the current input state value and the current byte value The first character string is calculated based on the first character string using a perfect hash function to obtain an index value; based on the current input state value and the current byte value, a hash function is used to obtain a hash value; and the hash value is judged to be Whether the preset hash value is equal, where the preset hash value is stored in an array subscripted by the index value; if the hash value is equal to the preset hash value, the acquisition corr...

Embodiment 3

[0072] Embodiments of the present invention provide a large-scale keyword multi-mode matching device, such as Figure 4 As shown, the large-scale keyword multi-mode matching device 4 includes a memory 41 and a processor 42, wherein a computer program that can run on the processor is stored in the memory, and when the processor executes the computer program, the above-mentioned The steps of the method provided by Embodiment 1.

[0073] see Figure 4 , the large-scale keyword multimode matching device also includes: a bus 43 and a communication interface 44, the processor 42, the communication interface 44 and the memory 41 are connected through the bus 43; the processor 42 is used to execute the executable module stored in the memory 41, for example Computer program.

[0074] Wherein, the memory 41 may include a high-speed random access memory (RAM, Random Access Memory), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a large-scale keyword multimode matching method, device and equipment, and the method comprises the steps: scanning a to-be-tested text according to a preset state machine, andenabling the preset state machine to be generated through a preset keyword set; synthesizing the current input state value and the current byte value of the to-be-tested text into a first character string, calculating by utilizing a perfect hash function to obtain an index value, and calculating the current input state value and the current byte value through the hash function to obtain a hash value; obtaining a recorded failure state value and an output state value by judging a relation between the hash value and a preset hash value stored in an array taking the index value as a subscript; enabling the recorded failure state value and the output state value to correspond to the keywords respectively, and outputting the keywords in the to-be-tested text, so that the retrieval efficiency ofkeyword multi-mode matching is improved by applying a perfect hash function.

Description

technical field [0001] The present invention relates to the technical field of multi-mode matching, in particular to a large-scale keyword multi-mode matching method, device and equipment. Background technique [0002] The AC algorithm is a classic algorithm in multi-mode matching and is widely used in industrial production environments. In the process of state transition, the state machine-based multi-mode matching algorithm needs to quickly retrieve the output state through the input state and trigger conditions, so as to improve the efficiency of pattern matching. [0003] Existing AC algorithms usually use B+ tree or hash table to retrieve keywords, but the storage method of B+ tree is weaker than hash table in terms of search efficiency, and commonly used hash table needs to open up redundant storage space to avoid conflicts, and large-scale state machines are more prone to conflicts when constructing common hashes, and the retrieval efficiency is low. Contents of th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/901G06F16/9032
Inventor 袁春峰曲志峰纪翀楼方平
Owner 南京中孚信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products