Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method based on repetitive sequence recognition for splicing sequencing data of whole genome

A whole-genome sequencing and repeating sequence technology, applied in the field of genetic engineering, can solve problems such as splicing errors in large genome data of higher animals and plants

Inactive Publication Date: 2002-07-24
北京六合华大基因科技有限公司
View PDF0 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to propose a splicing method for whole genome sequencing data based on repetitive sequence identification, after analyzing the rules of the repetitive sequences in the genomes of higher animals and plants when using the "shotgun method" for sequencing, to solve the problem caused by repetitive sequences The problem of data splicing errors caused by large genomes of higher animals and plants, thus providing a reliable means for efficient and rapid whole-genome sequencing analysis of higher animals and plants using the "shotgun method"

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method based on repetitive sequence recognition for splicing sequencing data of whole genome
  • Method based on repetitive sequence recognition for splicing sequencing data of whole genome
  • Method based on repetitive sequence recognition for splicing sequencing data of whole genome

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] Below in conjunction with accompanying drawing, introduce each step of the inventive method in detail:

[0027] In order to identify repetitive sequences, the present invention first sets a minimum fragment length, generally set to 15bp-20bp, and repetitive sequences shorter than this length will not be considered. To simplify the model, it is assumed that all sequencing reads are equal in length, L.

[0028] The meaning of the parameters in the following formulas: G: total genome length, L: average effective read length of sequencing N: number of successful sequencing reactions, F: minimum fragment length for identification.

[0029] Count the occurrences of small non-repeated fragments in shotgun sequencing:

[0030] Define a random variable Y ik Describe the event that the above-mentioned DNA fragment of the specified length appears K times in the whole genome sequencing by the shotgun method:

[0031] If the number of occurrences of fragments starting from a ce...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method based on repetitive sequence recognition for splicing the sequencing data of full-genome includes calculating the probability distribution of non-repetitive and repetitive fragments in sequencing data, determining the standard for recognizing repetitive sequences, shielding the repetitive sequences with said standard, splicing by group according to size of target genoma, restoring the N in big fragment back to original bases, finding out relative big fragments and the reading between them, linking them together, and sorting them to obtain the working block diagram of target genome. Its advantages are high efficiency and high accuracy.

Description

technical field [0001] The invention relates to a splicing method of whole genome sequencing data based on repetitive sequence recognition, which belongs to the technical field of genetic engineering. Background technique [0002] Genomics is a comprehensive analysis of the complete set of genetic material of an organism to understand the function and role of genetic information from a holistic perspective. The most important step is to determine the complete set of genetic information of the organism, that is, to know the sequence of all nucleic acid bases of the organism, which is the so-called whole genome sequencing analysis. At present, the whole genome sequencing mainly adopts two strategies: 1. "Hierarchical cloning method", that is, the larger genome is first broken into medium-sized fragments (150kb~300kb) and cloned, and then the medium fragments are broken into small fragments (1kb~300kb). 3kb) for sequencing, and finally for data splicing by computer. For examp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): C12P19/34C12Q1/68
Inventor 李松岗王俊盖伊·王于军汪建杨焕明倪培相韩玉军黄显刚张建国胡咏武
Owner 北京六合华大基因科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products