Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for repairing genome sequencing and assembling results, and storage medium

A genome sequencing and genome assembly technology, which is applied in the field of genome sequencing and assembly result repair, and can solve problems such as no correction, incomplete assembly sequence, and loss of assembly sequence.

Active Publication Date: 2019-10-08
BGI TECH SOLUTIONS
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] (1) Use the Bionano molecular map to directly connect the assembly results, and directly interrupt the conflicting regions at the molecular markers. Due to the large distance between the molecular markers of the Bionano molecular map, this will lead to some actually normal sequences is also truncated, resulting in the loss of the original correct assembl

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for repairing genome sequencing and assembling results, and storage medium
  • Method and device for repairing genome sequencing and assembling results, and storage medium
  • Method and device for repairing genome sequencing and assembling results, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0073] In this example, for a cereal plant genome with a size of about 2.3Gb, during the genome assembly process, when using the data of the second-generation insert fragment library with a size of 450bp and 800bp to use Pilon software to correct errors and fill holes in the genome, it was found that there were 8 Gap sequences are filled, and the length of these gap sequences is greater than 3k, and even some regions with a gap length of more than 40k are also filled. In order to verify the reliability of these filled sequences, perform the following processing according to the genome sequencing assembly result repair method of this application:

[0074] (1) Molecular comparison step

[0075] 1) Reference sequence preparation: convert the sequence file of the genome assembly result into a file consisting of the position of the corresponding restriction site; specifically, mark the filled sequence ID information, and the position coordinates of the corresponding filled region, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a device for repairing genome sequencing and assembling results, and a storage medium. The method of the present invention comprises the following steps: comparinga genome assembling result to be verified with a Bionano molecular map, finding out areas with unmatched molecular markers or inconsistent lengths, and extending each of the upstream and the downstream of the genome sequence of every area for a preset length to form abnormal areas; respectively analyzing the coverage degrees of the abnormal areas by second-generation data and third-generation data; and repairing the abnormal areas according to the coverage degrees to obtain repaired genome assembling results. The method using a combination of a second-generation sequencing technology, a third-generation sequencing technology and the Bionano map to repair the genome assembling results solves the structural error introduced by the area complexity in genomic splicing in order to prevent excessive loss of the assembling results in structure conflct area operation processing of traditional Bionano verification, and also can process and verify the areas with inconsistent molecular lengths in Bionano and genome assembling results in order to improve the accuracy and integrity of genome splicing.

Description

technical field [0001] The present application relates to the field of nucleic acid sequencing, in particular to a method, device and storage medium for repairing genome sequencing assembly results. Background technique [0002] At present, the next-generation sequencing data obtained by the Illumina sequencing platform based on the whole genome shotgun method (WGS) has high sequencing throughput, high speed, high accuracy, and low cost, and can measure DNA fragment libraries of different insert sizes, especially Measuring the characteristics of large DNA fragment library sequences, for example, the ability to measure libraries with insert lengths greater than 1k, has been widely used in genome assembly analysis in the past few years. [0003] However, due to the short sequencing fragments of the next-generation sequencing method, the paired-end sequencing method is used, and it is difficult to correctly process the sequencing data for regions with high complexity within the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B25/00G16B30/10
CPCG16B25/00G16B30/00
Inventor 贺丽娟刘亚斌杨林峰邓天全陈露高强
Owner BGI TECH SOLUTIONS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products