Analog cancer genome sequencing data generating device

A genome sequencing and data generation technology, applied in sequence analysis, bioinformatics, informatics, etc., can solve problems that cannot be simulated, cannot be evaluated by software, and achieve performance evaluation and improvement effects

Active Publication Date: 2019-07-16
ZHEJIANG ANNOROAD BIO TECH CO LTD +1
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The advantage of this technology is that it uses real sequencing data to simulate the background error of genome sequencing; the main disadvantage of this technology is that it can only simulate SNV mutations, but cannot simulate INDEL, CNV, and FUSION mutations, so it cannot be used to detect INDEL, CNV, and FUSION software. Evaluate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Analog cancer genome sequencing data generating device
  • Analog cancer genome sequencing data generating device
  • Analog cancer genome sequencing data generating device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] Target:

[0041] Generate simulated cancer genome sequencing data, simulate SNPs and INDELs with different frequency gradients, and use them to evaluate the performance of three variant detection software, GATK, mutect, and varscan.

[0042] step:

[0043] 1. The capture region location test.bed (254757bp) is randomly divided into 7 parts: respectively used to generate fastq files with mutation frequencies of 0.005, 0.01, 0.05, 0.1, 0.5, 0.9, and 1.

[0044] 2. Obtain the reference sequence fasta file corresponding to each location area file.

[0045] 3. Generate a variant fasta file (including 227 SNPs and 23 INDELs) on the reference sequence with snpfreq=0.001 and indfreq=0.001 / 10;

[0046] 4. With a depth of 1000 and a PE read length of 75, simulate and generate fastq files at each mutation frequency; merge the fastq files of each frequency.

[0047] 5. Get the bam file after comparing the fastq file, and then extract the bam file in the test.bed area

[0048] 6....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an analog cancer genome sequencing data generating device. The device comprises a human reference genome sequence position information acquiring module, a capturing area reference genome sequence acquiring module, a cancer genome variation data analog module, an analog cancer genome sequencing data generating module and an analog cancer genome sequencing data output module. The invention provides an algorithm and a device which can simulate various kinds of variations so that the generated analog sequencing data is suitable for evaluation of properties of various kindsof detection software.

Description

technical field [0001] The invention relates to the field of cancer gene detection, in particular to a device for generating simulated cancer genome NGS sequencing data. Background technique [0002] Accurate detection of somatic alterations in genome sequences of cancer patients is key to understanding cancer progression, patient survival, and response to therapy. For the performance evaluation and performance improvement of a mutation detection software, a series of known mutation information is necessary. Sequencing data with known mutation information can be simulated by simulating the cancer genome sequencing data algorithm. The advantage of this algorithm is that there is no special requirement for the sequencing method. [0003] At present, for the simulated genome sequencing data, the strategy adopted by BAMSurgeon is to first compare the real sequencing reads to obtain the bam file, and then modify the fixed sites on the basis of the bam file, so as to obtain the b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B30/00
CPCG16B30/00
Inventor 荆瑞琳王娟李大为玄兆伶王海良
Owner ZHEJIANG ANNOROAD BIO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products