Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method, device and storage medium for repairing genome sequencing assembly results

A technology of genome sequencing and genome assembly, which is applied in the field of repairing genome sequencing and assembly results, and can solve problems such as incomplete assembly sequences, sequence truncation, and assembly sequence loss

Active Publication Date: 2021-03-23
BGI TECH SOLUTIONS
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] (1) Use the Bionano molecular map to directly connect the assembly results, and directly interrupt the conflicting regions at the molecular markers. Due to the large distance between the molecular markers of the Bionano molecular map, this will lead to some actually normal sequences is also truncated, resulting in the loss of the original correct assembly sequence
[0007] (2) In the traditional direct processing method of the Bionano molecular map, the regions with matching molecular marker structures but inconsistent lengths are treated as structural variations (abbreviated SV) and are not corrected in the assembly results, but in practice, such The sequence may also be caused by an incomplete assembly sequence

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0073] In this example, for a cereal plant genome with a size of about 2.3Gb, during the genome assembly process, when using the data of the second-generation insert fragment library with a size of 450bp and 800bp to use Pilon software to correct errors and fill holes in the genome, it was found that there were 8 Gap sequences are filled, and the length of these gap sequences is greater than 3k, and even some regions with a gap length of more than 40k are also filled. In order to verify the reliability of these filled sequences, perform the following processing according to the genome sequencing assembly result repair method of this application:

[0074] (1) Molecular comparison step

[0075] 1) Reference sequence preparation: convert the sequence file of the genome assembly result into a file consisting of the position of the corresponding restriction site; specifically, mark the filled sequence ID information, and the position coordinates of the corresponding filled region, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present application discloses a method, device and storage medium for repairing genome sequencing assembly results. The method of the present application includes comparing the assembly results of the genome to be verified with the Bionano molecular map to find out the region where the molecular markers of the two do not match or have inconsistent lengths, and extend the preset lengths in the upstream and downstream of the genome sequence of the region as abnormal regions ; respectively analyze the coverage of the second-generation data and the third-generation data on the abnormal region; repair the abnormal region according to the coverage, and obtain the repaired genome assembly result. The method of this application uses the second-generation sequencing technology, the third-generation sequencing technology and the Bionano map to jointly repair the genome assembly results, solve the structural errors introduced by the complexity of the region in the genome assembly, and prevent the traditional Bionano verification from operating on the structural conflict region. The excessive loss of assembly results can also process and verify the inconsistent length of molecular markers in Bionano and genome assembly results, improving the accuracy and integrity of genome assembly.

Description

technical field [0001] The present application relates to the field of nucleic acid sequencing, in particular to a method, device and storage medium for repairing genome sequencing assembly results. Background technique [0002] At present, the next-generation sequencing data obtained by the Illumina sequencing platform based on the whole genome shotgun method (WGS) has high sequencing throughput, high speed, high accuracy, and low cost, and can measure DNA fragment libraries of different insert sizes, especially Measuring the characteristics of large DNA fragment library sequences, for example, the ability to measure libraries with insert lengths greater than 1k, has been widely used in genome assembly analysis in the past few years. [0003] However, due to the short sequencing fragments of the next-generation sequencing method, the paired-end sequencing method is used, and it is difficult to correctly process the sequencing data for regions with high complexity within the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G16B25/00G16B30/10
CPCG16B25/00G16B30/00
Inventor 贺丽娟刘亚斌杨林峰邓天全陈露高强
Owner BGI TECH SOLUTIONS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products