GPU acceleration-based DNA sequence compression method and system

A DNA sequence and compression method technology, applied in the field of genes, can solve problems such as the inability to realize DNA sequence compression, and achieve the effect of accelerating the compression speed

Inactive Publication Date: 2018-07-17
SHENZHEN UNIV
View PDF1 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The technical problem to be solved by the present invention is to provide a GPU-accelerated DNA sequence compression method and sy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • GPU acceleration-based DNA sequence compression method and system
  • GPU acceleration-based DNA sequence compression method and system
  • GPU acceleration-based DNA sequence compression method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0051] The embodiments of the present invention relate to the matching of DNA base sequences on the reference genome and elimination of redundancy, and elimination of redundant parts of DNA metadata. The matching of DNA base sequences based on the reference genome, the encoding of simplified metadata, the encoding of matching results, and the encoding of quality scores are realized on the GPU. In the case of reasonable compression ratio, use GPU to accelerate the compression speed of DNA sequence.

[0052] Since the end of the 20th century, with the continuous development of biological sequencing ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention is suitable for the technical field of genes, and provides a GPU acceleration-based DNA sequence compression method. The method comprises the steps that a central processing unit adoptsa template chain algorithm to simplify metadata, and sends the simplified metadata to a GPU; the GPU matches a DNA base sequence with a reference genome by utilizing a sparse index algorithm to obtaina matching result; and the GPU performs compression on the matching result, the simplified metadata and a quality score by adopting a Burrows-Wheeler transform algorithm, a Move-to-front conversion algorithm and an interval encoder to obtain a compressed DNA sequence. The CPU and the GPU are combined to run asynchronously; the compression of the DNA sequence in the GPU is realized; and the compression speed of the DNA sequence can be increased by utilizing a calculation unit in the GPU.

Description

technical field [0001] The invention belongs to the field of gene technology, and in particular relates to a reference genome-based DNA sequence compression method and system accelerated by a graphics processor GPU. Background technique [0002] At present, relevant staff have conducted some research on DNA sequence compression tools, and have also obtained certain results. Existing commonly used compression tools based on the reference genome: LW-FQZip2, Quip(-r), DeeZ and CRAM. in: [0003] LW-FQZip2 is a DNA sequence compression tool based on the reference genome. It establishes a sparse index for the reference genome, locates the base sequence to the corresponding sparse index position, matches on the reference genome, and considers insertions, deletions, and mismatches. Use threads to divide the program that takes a long time as a whole into several tasks and put them in the background for processing, realize the parallelization of lightweight DNA sequence matching ap...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/28
Inventor 朱泽轩彭聪孙怡雯
Owner SHENZHEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products