Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Short message search method and system based on suffix arrays

A suffix array and short message technology, applied in the field of data search, can solve the problem of long time consumption of short message search method, and achieve the effect of fast query speed, high query efficiency and improved search speed.

Inactive Publication Date: 2017-08-11
SYSU CMU SHUNDE INT JOINT RES INST +1
View PDF2 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a short message search method based on a suffix array in order to solve the shortcoming that the short message search method provided by the above prior art takes a long time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short message search method and system based on suffix arrays
  • Short message search method and system based on suffix arrays
  • Short message search method and system based on suffix arrays

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] like figure 1 Shown, the method provided by the invention comprises the following steps:

[0028] S1. Construct a suffix array for each short message in the short message list according to its short message string content, and then sort each suffix array item in all the suffix arrays constructed according to the preset rules;

[0029] S2. When receiving a keyword for searching short messages, according to the order of the received characters, each character in the received keyword is used as an index for binary search in turn;

[0030] S3. Use the i-th character in the keyword as an index to perform a binary search in all sorted suffix array items, and use the suffix array whose first character is the index corresponding to the i-th search result; i's The initial value is 1;

[0031] S4. Set i=i+1 and then use the i-th character in the keyword as an index to perform a binary search in the suffix array item contained in the i-1 search result, and then match the first c...

Embodiment 2

[0036] This embodiment provides a system applying the scheme of Embodiment 1, such as figure 2 As shown, the specific scheme is as follows:

[0037] Including string reading module, construction module, sorting module and search module;

[0038] Wherein the string reading module is used to read the string content of each short message in the short message list;

[0039] The construction module is used to construct a suffix array for each text message in the text message list;

[0040] The sorting module is used to sort each suffix array item in all constructed suffix arrays;

[0041] The search module is used to perform a binary search in all sorted suffix array items according to keywords, and then use the suffix array corresponding to the searched suffix array items as the search result.

Embodiment 3

[0043] This embodiment is an illustration of the scheme of embodiment 1, as figure 2 As shown, the specific process is as follows:

[0044] Step 1. First, construct suffix arrays for the character string contents of the two short messages, as shown in Table 1 and Table 2 respectively.

[0045]

[0046] Step 2. After constructing a suffix array for each short message in the short message list, sort each suffix array item in all the constructed suffix arrays according to a preset rule.

[0047] The rule described here is to sort by the first letter of the phonetic alphabet of the first Chinese character; obtain the list of the suffix array of Table 3;

[0048] table 3

[0049]

[0050] Step 3. When receiving the keyword "eat*fan" (*represents any character) for searching text messages, first, according to the first character "eat" input by the user, the pinyin initial letter "C" of the character "eat" is " is compared with the pinyin initial letter "M" of the first cha...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a short message search method based on suffix arrays. The method comprises the steps that S1, a suffix array is constructed for each short message in a short message list, and then all suffix array items in all the suffix arrays obtained through construction are ordered; S2, when a keyword for searching for a short message is received, all characters in the received keyword are sequentially used as indexes for binary search according to a character receiving order; S3, the i(th) character in the keyword is used as an index to perform binary search in all the suffix array items which are ordered, and the suffix array corresponding to the suffix array items with the first character being the index is used as an i(th) search result; S4, it is assumed that i=i+1, the i(th) character in the keyword is used as an index to perform binary search in the suffix array items contained in an (i-1)th search result, and then the suffix array corresponding to the suffix array items with the first character being the index is used as the i(th) search result; and S5, the step S4 is executed repeatedly till i is greater than n, and at the moment the short message corresponding to the i(th) search result is used as a short message search result to be output, wherein n is the number of characters contained in the keyword.

Description

technical field [0001] The present invention relates to the field of data search, and more specifically, to a short message search method and system based on a suffix array. Background technique [0002] The suffix array was originally proposed as an alternative to the suffix tree. Compared with the suffix tree, it requires less space to store the suffix array and has a wider range of applications. After the suffix array was proposed, the suffix array, as an important index data structure, is widely used in fields such as bioinformatics, full-text indexing, string matching, frequent string mining, sequence analysis, and cluster analysis. [0003] At present, instant messaging devices generally provide the function of fuzzily searching short messages. Fuzzy search refers to the process of searching without using the full name of the search target as a keyword, but using the partial name of the search target as a keyword. How to quickly and fuzzily search for short messages ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/3331G06F16/3344G06F40/30
Inventor 邵长飞劳斌
Owner SYSU CMU SHUNDE INT JOINT RES INST
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products