Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Three-generation sequencing data overlapping detection method and system

A technology for overlapping detection and sequencing data, which is applied in electrical digital data processing, sequence analysis, multi-programming devices, etc. The effect of improving parallel computing speed and thread scalability

Pending Publication Date: 2020-06-16
SHANDONG UNIV
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, the Minimap algorithm uses a sorting algorithm when indexing, but this part is not parallelized
Finally, the hash function in the Minimap algorithm accounts for most of the calculation time of the program. The operations included in the hash function support vectorization operations, but currently Minimap does not use vector processor resources for underlying acceleration.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Three-generation sequencing data overlapping detection method and system
  • Three-generation sequencing data overlapping detection method and system
  • Three-generation sequencing data overlapping detection method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] figure 1 A flow chart of a method for detecting overlapping of third-generation sequencing data in this embodiment is given.

[0037] Combine below figure 1 The specific implementation process of the third-generation sequencing data overlap detection method in this embodiment is given.

[0038] Such as figure 1 As shown, this embodiment provides a three-generation sequencing data overlap detection method, including:

[0039] Step S101: receiving all DNA sequences of the third generation sequencing data, and sorting the DNA sequences according to length.

[0040] The benefit of sorting can reduce the difference in computing tasks corresponding to two adjacent sequences. Since parallel optimization includes vectorized optimization, if the lengths of adjacent sequences are too different, most of the calculation channels in the vector register will be idle. So ordering is critical to keep the parallel implementation load balanced.

[0041]Step S102: According to the p...

Embodiment 2

[0065] Figure 5 A schematic structural diagram of a three-generation sequencing data overlap detection system in this embodiment is given.

[0066] Combine below Figure 5 The structural principle of the third-generation sequencing data overlap detection system in this embodiment is given:

[0067] Such as Figure 5 As shown, the three-generation sequencing data overlap detection system of this embodiment includes:

[0068] (1) a sequencing data preprocessing module, which is used to receive all the DNA sequences of the third generation sequencing data, and sort the DNA sequences according to the length;

[0069] The benefit of sorting can reduce the difference in computing tasks corresponding to two adjacent sequences. Since parallel optimization includes vectorized optimization, if the lengths of adjacent sequences differ too much, most of the calculation channels in the vector register will be idle. So ordering is critical to keep the parallel implementation load bala...

Embodiment 3

[0091] This embodiment is a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the following figure 1 Steps in the overlap detection method for three-generation sequencing data are shown.

[0092] In this embodiment, according to the strategy that the size of the total DNA data processed by each thread is equal, all DNA sequences are allocated to a preset number of parallel threads, so that the load of the multithreading is balanced, and the acceleration ratio of the multithread parallel implementation is guaranteed at the same time;

[0093] This embodiment constructs a reference gene hash index table based on a double-array structure; wherein, the reference gene hash index table is divided into two arrays, and the index array stores the positions where the minimizers corresponding to different hash values ​​are stored in the structure array, and the structure The position information of the minimizer is s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a three-generation sequencing data overlapping detection method and system. The third-generation sequencing data overlapping detection method comprises the following steps: receiving all DNA sequences of third-generation sequencing data, and sorting the DNA sequences according to the length; allocating all DNA sequences to a preset number of parallel threads according to a strategy that the total DNA data processed by each thread is equal in size; for each thread, solving a sub-sequence with the minimum hash value of each window of all DNA sequences and taking the sub-sequence as a minimizer; establishing indexes for all the minimizers according to the hash values, and constructing a reference gene hash index table based on a double-array structure, wherein the reference gene hash index table is divided into two arrays, the index arrays store the storage positions of the minimizers corresponding to different hash values in the structure array, and the structure array stores the position information of the minimizers; and performing DNA sequence overlapping detection according to the reference gene hash index table based on the double-array structure. Sequencing data overlapping detection efficiency can be improved.

Description

technical field [0001] The invention belongs to the field of sequencing data processing, and in particular relates to a method and system for detecting overlapping of third-generation sequencing data. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] The third-generation sequencing technology is a new generation of DNA sequencing technology. The average length of DNA sequences has increased from 200 to 10,000. Long DNA sequences can contain more abundant gene information and accelerate the subsequent gene splicing process. For three-generation sequencing data, DNA overlap detection is an important sequence analysis process. Overlap refers to the part of character matching between two sequences, where DNA can be regarded as composed of A / C / G / T characters. [0004] The core of the existing DNA sequence overlap detection algorithm is to find t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B30/10G16B50/30G06F9/50
CPCG16B30/10G16B50/30G06F9/505G06F2209/5018Y02D10/00
Inventor 刘卫国槐敏涵产院东
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products