Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for multi-thread parallel construction of suffix array and system

A suffix array and multi-threading technology, which is applied in multi-programming devices, electrical digital data processing, special data processing applications, etc., can solve problems such as unsatisfactory fast processing and difficult to meet operator's target expectations, and achieve fast speed and operation fast effect

Inactive Publication Date: 2018-11-13
FOSHAN SHUNDE SUN YAT SEN UNIV RES INST +2
View PDF6 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, with the explosive growth of data scale, existing serial methods and systems cannot meet the needs of fast processing when dealing with large-scale string data, especially under the memory model of multi-core computers, it is difficult to achieve operation target expectations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for multi-thread parallel construction of suffix array and system
  • Method for multi-thread parallel construction of suffix array and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] Among them, the following technical terms may be used in the description of the present invention, which are explained here:

[0060] Character set: A character set Σ is a set that establishes a total order relationship, that is, any two different elements α and β in Σ can be compared in size, or αβ. The elements in the character set Σ are called characters, and the smallest character is '$'. The size of the character set involved in the present invention can be a constant O(1) or an integer O(n).

[0061] String: A string X with a length of n is an array X[0,n-1] formed by arranging n characters belonging to the character set Σ in order of their positions. The terminator of X is fixed as '$', and '$' does not appear elsewhere in X.

[0062] Substring: The substring X[i, j] of the string X, i≤j, represents a string from position i to position j in the X string, that is, the characters X[i], X[i+1] ,…, A string composed of X[j].

[0063] Suffix: A suffix of the strin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for multi-thread parallel construction of a suffix array and a system. The method comprises the following steps: scanning a character string X, calculating the types of characters and suffixes by using an L / S type recognition method, and recording in an array t; scanning the array t, finding out the positions where LMS characters appear, obtaining first character pointers of LMS substrings and recording the pointers of the LMS substrings with an array P1; performing parallel induction and sorting on all LMS substrings in the X by the P1, a B and an SA and saving in an SA1; according to the sorting result, performing multi-thread parallel renaming on the LMS substrings in the X to form an X1; checking whether each character in the X1 is unique, if yes, directly sorting suffixes of the X1 to calculate a suffix array of the X1 and saving in the SA1; and finally calculating a suffix array of the X and saving in the SA according to the suffix array of the X1 saved in the SA1. The invention has high running speed, can be matched with the memory of a multi-core computer, and is suitable for constructing a suffix array of a large-scale string.

Description

technical field [0001] The invention relates to the field of string suffix array construction, in particular to a method and system for constructing a suffix array in parallel with multiple threads. Background technique [0002] Suffix Array (Suffix Array, SA) is an important data structure in computer science. It has the characteristics of compact structure and small space occupation. It is widely used in many fields such as full-text indexing, gene matching and data compression. Given any input string text X[0,n-1] of length n, referred to as string X, all characters from any position i in X to end position n-1 form a character substring X [i,n-1], the substring is called a suffix (Suffix) of the string X. Obviously, a string X with a length of n contains n suffixes, and these n suffixes are stored in an integer array in ascending lexicographical order, then this array is called the suffix array SA of the string X. [0003] In recent years, the memory space of general-pu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/46G06F17/30
CPCG06F9/46
Inventor 劳斌徐文涛解静仪农革
Owner FOSHAN SHUNDE SUN YAT SEN UNIV RES INST
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products