Genome structure variation distribution detection method and detection device

A technology for distribution detection and structural variation, applied in genomics, sequence analysis, proteomics, etc., can solve problems such as limited size, data congestion, and inability to distinguish overlapping events in rainfall maps

Pending Publication Date: 2022-03-04
SOUTHEAST UNIV +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditionally, the rainfall map is used to visualize the distribution of variation along the genome in the genome. Due to the limited size of the standard map, the rainfall map may not be able to distinguish overlapping events, especially when multiple data sets are plotted in the same map, resulting in data congestion.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Genome structure variation distribution detection method and detection device
  • Genome structure variation distribution detection method and detection device
  • Genome structure variation distribution detection method and detection device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0023] Such as figure 1 As shown, in this embodiment, a detection method for the variation distribution of genome structure is proposed, and the method includes the following steps: S1, acquisition and filtering of genome sequencing data; S2, calculation of the distance between adjacent mutations; S3, utilization of analysis The Piecewise Constant Fitting (PCF) algorithm segmented the genome; S4, visualized the distribution of variation along the genome.

[0024] In some specific embodiments, S1, the specific steps of obtaining and filtering genome sequencing data are as follows: Two formats of cancer genome sequencing files (VCF and MAF) can be obtained, including variants '#CHROM', 'POS', The information of 'REF', 'ALT', 'FILTER'; filter the variation according to the column 'FILTER' in the file, and extract the variation data corresponding to the column 'FILTER' as "PASS".

[0025] In some specific embodiments, the specific steps in S2, calculating the distance between adj...

Embodiment 2

[0048] Such as figure 2 As shown, in this embodiment, a detection device for genome structure variation distribution based on high-throughput sequencing technology is proposed, which is characterized in that it has an input module, a calculation module, a genome segmentation module, and a visualization module;

[0049] A: Input module, which contains two file reading units including VCF (Variant Call Format) unit and MAF (Mutation Annotation Format) unit;

[0050] B: Calculation module, which sorts the mutations according to the genome coordinates and calculates the distance between adjacent mutations, and outputs new mutation coordinates;

[0051] C: Genome segmentation module, which uses the Piecewise Constant Fitting (PCF) algorithm to segment the genome, and outputs the position of the segmented segment and the number of variations contained therein;

[0052]D: Visualization module, which shows the distribution of variation along the genome.

[0053] In some specific em...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of genome data analysis, and discloses a genome structure variation distribution detection method and device, and the detection method comprises the following steps: obtaining and filtering genome sequencing data, calculating the distance between adjacent variations, segmenting a genome by using a piecewise constant fitting (PCF) algorithm, and visualizing the distribution of variations along the genome. The detection device comprises an input module, a calculation module, a genome segmentation module and a visualization module. Based on high-throughput sequencing data, the method is suitable for VCF or MAF files of any genome, detection of genome structure variation along chromosome distribution is achieved through the PCF algorithm, and mutation hot spot areas of the genome can be found easily.

Description

technical field [0001] The invention relates to the technical field of genome data analysis, in particular to a detection method and a detection device for variation distribution of genome structure. Background technique [0002] Structural variations in the human genome sequence are insertions, deletions, or rearrangements of segments of DNA sequence ranging in length from approximately 1,000 to millions of base pairs. Over the past few years, structural variation has become much more prevalent in the human genome than previously thought, and it is not randomly distributed across the genome. The study of structural variation is of great significance to the study of genome evolution, population polymorphism analysis and disease susceptibility. With the development of second-generation high-throughput sequencing technology, the structural variation map on the human genome has been truly comprehensively and intensively studied. Traditionally, the rainfall map is used to visu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B20/20G16B20/50G16B30/00
CPCG16B20/20G16B20/50G16B30/00
Inventor 李健林雪刘安娜许利群孙泽鹏乔丰刘新龙
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products