Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Pathogenic microorganism genome database and establishment method thereof

A technique for pathogenic microorganisms and establishment methods, which is applied in the field of pathogenic microorganism genome databases and its establishment, can solve problems such as increased analysis costs, false negative test results, and fast analysis timeliness, so as to reduce the demand for analysis and computing resources and improve detection accuracy , The effect of shortening the analysis time

Active Publication Date: 2019-11-19
GZ VISION GENE TECH CO LTD +4
View PDF8 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At present, most of the metagenomic analysis processes are just to select one of the strains as the representative genome of the species randomly or through clustering analysis, or to include all the strains into the database indiscriminately. Both methods have their own advantages. shortcoming
The advantage of selecting only one strain as a representative method is that the amount of database data constructed is relatively small, and the analysis time is fast, but the disadvantages are also prominent, because microbial genomes evolve rapidly, and there are differences in the genomes of strains of the same species. Different regions, There may be differences in the sequenced genomes of different strains isolated at different times, or even at the same time, and the genome differences between strains of some fast-evolving species can reach more than 3%
[0005] At present, in the clinical application of pathogenic metagenomics, the number of detected sequences of most pathogens is between tens to hundreds, or even single digits, and the genome coverage rate is below 0.1%, while the genome mutation rate among strains has reached as high as 3 %, and even contain unique sequences. Therefore, for the method of selecting only one strain as the representative genome of the species, it is difficult to detect the mutation region or the unique sequence region with a coverage rate of 0.1%, which often leads to missed detection and false negative detection results.
[0006] The method of including the genomes of all strains of the same species into the database can effectively avoid the missed detection of this situation, but the shortcomings of this method are also particularly obvious
On the one hand, after the genomes of all strains are included, the amount of data in the database becomes very large, resulting in a very long analysis time, even requiring more than one day, which is unacceptable for the timeliness requirements of clinical applications, and often more treatments may be treated an hour earlier For a single patient, the resource requirements for computing servers or clusters are also greatly increased, and the analysis cost is greatly increased; on the other hand, the sequencing quality of strain genomes from public databases is uneven, and some strains contain contaminated sequences and even classification If the wrong strain is not screened and filtered, it will easily lead to false positive results, which will bring great troubles to clinical diagnosis and treatment.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Pathogenic microorganism genome database and establishment method thereof
  • Pathogenic microorganism genome database and establishment method thereof
  • Pathogenic microorganism genome database and establishment method thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] A pathogenic microorganism genome database is established by the following method:

[0049] 1. Data acquisition

[0050] Download bacterial genome data from PATRIC as follows:

[0051] The full name of PATRIC is the American Pathogenic Microorganism Resource Integration Center. The website contains most of the currently known bacterial pathogenic microorganism genome data, and all collected bacterial genome data can be downloaded from its ftp server.

[0052] On the PATRIC website ftp server (ftp: / / ftp.patricbrc.org / ), all genome data classified as archaea and bacteria were downloaded, and the corresponding genome information statistics file PATRIC_genome.txt was downloaded, which contained genome information of 227,577 strains in total.

[0053] 2. Strain Genome Screening

[0054] According to the header information of the file, select the genomes whose "Public" column is "True", "Genome Status" is "Complete", and "Genome Quality" is "Good". After screening, 13537 ge...

Embodiment 2

[0076] In order to evaluate the effect of the fusion genome of the Klebsiella pneumoniae constructed in the above-mentioned embodiment 1, all untreated strain genomes of Klebsiella pneumoniae, the NCBI reference strain genome of Klebsiella pneumoniae, the above-mentioned Klebsiella pneumoniae The accuracy and analysis time of the fusion genome of the bacteria were analyzed and compared.

[0077] 1. Data volume evaluation and comparison

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a pathogenic microorganism genome database and an establishment method thereof, and belongs to the technical field of meta-genomes. The method comprises the following steps ofdata acquisition, wherein pathogenic microorganism genome data is obtained; strain genome screening, wherein species strain genomes are selected according to a predetermined screening rule; plasmid sequence removal, wherein plasmid sequences existing in the strain genomes obtained in the last step are removed; filtration, wherein according to a predetermined filtering rule, strains with incorrectlabeling information, incomplete chromosome assembly and incorrect classification are removed to obtain a reference strain genome of the species; fusion genome construction, wherein the reference strain genome is interrupted, redundancy is removed, reassembly is performed, and the sequences are reassembled to obtain a fusion genome of the species; database assembly, wherein the above steps are repeated to obtain the fusion genome of the predetermined species, summary is performed, and the pathogenic microorganism genome database is obtained. The genome database has the advantages of not onlyhaving a high precision rate, but also having short analysis time and reducing the cost.

Description

technical field [0001] The invention relates to the technical field of metagenomics, in particular to a pathogenic microorganism genome database and a method for establishing the same. Background technique [0002] Pathogen metagenomic (transcriptome) sequencing is an emerging technology applied to the detection of clinical pathogenic infections. It has the advantages of wide detection of pathogens, high sensitivity, high accuracy, and fast timeliness. It is gradually becoming the first-line pathogenic infection detection method in clinical practice. One of the core technologies of pathogenic metagenomics is the database of pathogenic microorganism genomes. The quality of the database directly affects the number, accuracy and analysis performance of pathogenic microorganisms detected by pathogenic metagenomics. [0003] The pathogenic microorganism genome database is composed of genomes of various species, most of which are collected in public databases such as NCBI, and con...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B35/10
CPCG16B35/10
Inventor 许腾陈文景李永军王小锐苏杭
Owner GZ VISION GENE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products