Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for structural variation detection and storage medium

A technology of structural variation and signal, applied in the field of bioinformatics, can solve the problems of accuracy impact, difficult interchromosomal structure, time-consuming, etc., achieve high detection rate and precision, and reduce the effect of false positive signals

Active Publication Date: 2022-07-12
SHENZHEN GENEPLUS CLINICAL LAB
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The limitations of detection methods based on next-generation sequencing lie in the length of the sequencing reads and the length of the template; many methods are limited by these limitations and can only detect variations within the length of the template, often within a few hundred bp, and larger variations Resource-intensive strategies such as de novo assembly are required, and the diversity of assembly results also makes it difficult for these methods to determine the original content of the sequence
For example, when detecting fusion breakpoints, many methods use the strategy of clustering SR signals, such as BreakSeek, an Indel breakpoint detection algorithm based on the Bayesian model. If the depth is high, the iteration time will be longer, and if the depth is low, the accuracy will be affected. big impact
[0006] The biggest defect of the traditional next-generation sequencing-based structural variation detection method is the poor performance in identifying large or even super-large structural variations. Most methods can only detect structural variations within a few thousand bp, and structural variations exceeding the insert size Poor detection ability
For example, inGap-sv, a detection method based on depth differences, identifies structural variations through DP, SR, SU and the number of normal read pairs, and corrects the results with depth information, but cannot identify more complex or cross-chromosomal structural variations; manta, SV- Assembly methods such as aba are difficult to work in high-repetition regions and take a long time; classic methods such as Pindel and Delly have better results in detecting small indels, but once they detect structural variations that exceed the length of the template fragment, they do not perform well. good
Another problem that is difficult to overcome with traditional methods is that in order to obtain more accurate fusion breakpoints, clustering or partial assembly is generally required, which is a place where differences are prone to occur.
[0007] Next-generation sequencing technology will still occupy a dominant position in the market at present and can be expected for a long time to come; therefore, how to solve the difficulty of accurate breakpoint detection based on next-generation sequencing data, and the difficulty of large-scale and inter-chromosomal structural variation The problem of identification is still the focus and difficulty of research in this field

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for structural variation detection and storage medium
  • Method and device for structural variation detection and storage medium
  • Method and device for structural variation detection and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0087] The structural variation detection method in this example is as follows:

[0088] Input: preprocessed bam file, reference series;

[0089] 1. Data acquisition steps, including acquiring the bam file of the next-generation sequencing data of the object to be tested, calculating the basic information of the bam file, insert size mean and standard deviation, insert size max (insert size mean+3.96*insertsize std), and reads length;

[0090] 2. In the signal classification step, the reads in the interval are extracted in parallel according to the length of 75k from the bam file, and the abnormal reads are divided into four signals: DP (insert size>insert size max or two paired reads fall on two different chromosomes ), SR (reads with soft shearing), SU (only one of the read pairs matches the reference sequence), and put them in a temporary file after extraction;

[0091] 3. DP signal clustering analysis step, cluster the DP signals extracted in step 2, find DP signal cluste...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a structure variation detection method and device and a storage medium. The method comprises the steps of obtaining a comparison file, extracting reads in an interval from the comparison file according to a set length, and dividing abnormal reads into DP signals, SR signals and SU signals; the DP signals are clustered, each cluster serves as a structural variation candidate, and local assembly and re-comparison are carried out on each cluster; finding an embedding comparison from the SR signals, and carrying out assembly and re-comparison; and performing fusion breakpoint left and right side mutation depth calculation and structure variation type identification on two re-comparison results. According to the method, by means of DP signal clustering and assembly re-comparison, false positive signals in clusters are reduced; and the SR signal analysis is used for supplementing, so that the detection rate and the precision of the whole result are higher. According to the method, structural variations such as deletion, inversion, repetition, translocation in chromosomes, translocation between chromosomes and the like can be recognized, and micro homologous sequences and short template sequences near breakpoints are provided for output.

Description

technical field [0001] The present application relates to the technical field of bioinformatics, and in particular, to a method, device and storage medium for structural variation detection. Background technique [0002] Structural variation (SV) includes deletions, insertions, inversions, duplications, and translocations within the genome, as well as complex structural variations composed of these simple types. After more than ten years of development, the research on structural variation detection methods based on the Next-generation Sequence technology data has become more and more mature, but there are still some difficulties that cannot be completely overcome; these include precise breakpoints, Larger size and identification of structural variation between chromosomes, etc. After the rapid development in the field of bioinformatics in recent years, various detection methods for these problems have also been widely proposed, such as switching to three-generation long-re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G16B20/20G16B20/50G16B30/00G16B40/10
CPCG16B20/20G16B20/50G16B30/00G16B40/10Y02P90/30
Inventor 刘涛何俊义苏亚男李敏吴永鑫
Owner SHENZHEN GENEPLUS CLINICAL LAB
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products