Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Optimized method for analyzing microbial communities through metagenome binning

A metagenomic and microbiota technology, applied in the field of high-throughput sequencing technology and bioinformatics analysis of sequencing data, can solve problems such as difficulty in providing bin gene annotations, missing target new species, and lack of intuitive understanding by users

Active Publication Date: 2020-11-13
GUANGZHOU GENE DENOVO BIOTECH
View PDF18 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] 1) Single assembly strategy
The sample type (such as natural environment samples, symbiotic bacteria samples), the amount of sample sequencing data (such as 6G, 10G, 20G, 100G, etc.), the number of biological repeats, etc. were not fully considered, and the default assembly parameters of the software were directly used without testing and evaluation , leading to disadvantages such as non-targeted assembly, low computing efficiency, and taking up more operating resources;
[0012] 2) The annotation adopts the default parameters of the software, without comparison and optimization, and the annotation strategy is single
Although some patents already have a binning analysis process based on metagenomics, there is no time-consuming and resource-consuming data correction part, which will directly affect the accuracy of bin clustering and bin identification;
[0016] 3) It is difficult to provide detailed bin gene annotations
[0017] 3) The reference value of bin species annotation results is extremely low
The default output of the existing process is the genus-level species or NA (unknown genus level) of each species. On the one hand, only the genus level is displayed, which is too abstract, and the user does not have an intuitive understanding and cannot obtain effective reference information. The database checks annotations at other classification levels, and generally one sample may obtain hundreds of high-quality bins, so the workload is relatively large; A single bacterial genome can be used as a specific species, but the existing process does not provide species-level annotations, which may prevent customers from efficiently identifying new species of the same genus and different species; in addition, the genus level is unknown, which does not mean that the higher-level taxonomic level annotations are unknown. For many particularly new species, it is true that they can only be annotated to the order, family and other classification levels, so this part of the bin annotated as NA cannot be scientifically evaluated and judged due to lack of information, and may even miss the possibility of discovering the target new species; And with the bioinformatics analysis level of mainstream customers, it is impossible to re-analyze and identify all NA bins;
[0018] 4) Lack of visual form
Only the simplest table output results are provided, without considering the user's efficient information extraction and summary requirements after obtaining the bin
[0019] 5) The existing process analysis lacks the follow-up data mining of bin
At present, after binning analysis, the bin is regarded as a potential genome and the analysis is interrupted, or only the genome sequence after the bin reassembly is provided, and the data mining ideas and results based on the target bin are not provided.
The binning analysis is only an intermediary, binning is only the starting point for subsequent analysis, and the real data mining has not yet begun
Since bin analysis spans from the total DNA of the community to the "genome", it jumps into a completely different analysis field, so it is very unfriendly to users' in-depth analysis in the later stage, and the requirements for users' biological foundation are too high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Optimized method for analyzing microbial communities through metagenome binning
  • Optimized method for analyzing microbial communities through metagenome binning
  • Optimized method for analyzing microbial communities through metagenome binning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0172] Example 1 (Assembly Strategy 2)

[0173] Non-natural environmental samples, four cases of different human intestinal samples (a total of about 6Gb data) in the assembled condition, the prior art current sample intestinal preferably assembled effect Document Han M, Yang P, Zhong C, et al .The Human GutVirome in Hypertension [J] .Frontiers in Microbiology, 2018,9: the assembly of theHuman Gut Metagenomic Data analyzing section Document N50 is 4152bp, and reference Examples of the present embodiment is assembled using the parameters 27,37,47 kmer , 57,67,77,87,97,107,117,127, assembled embodiment of the present embodiment N50 higher than the prior art, better assembly.

[0174] quantity Total length Average length N50 N90 The maximum length Minimum length GC Sample 1 27273 87611145 3212.38 4785 1263 371303 1000 48.54% Sample 2 26592 88292244 3320.26 4967 1329 124147 1000 57.51% Sample 3 17137 58929791 3438.75 5418 1315 277887...

Embodiment 2

[0176] Second Embodiment (policy assembly 4):

[0177] The prior art literature (Zhang M, Pan L, Huang F, et al.Metagenomic analysis ofcomposition, function and cycling processes of microbial community in water, sediment and effluent of Litopenaeus vannamei farming environments underdifferent culture modes [J] .Aquaculture, 2019 , 506: 280-293) in three cases the natural environment of water samples i.e. samples (GW, HE, HW) data, the process of the present invention to art methods, existing parameters, and literature data comparison, the following results:

[0178] (1) GW water, described in the literature a total length of 288 496, N50 is 954, the sample data for comparison to the prior art parameters and techniques of the present invention, the result data are as follows, techniques N50 present invention is 1027, the assembly was better than art parameters.

[0179]

[0180]

[0181] (2) HE water, described in the literature a total length of 356 232, N50 is 914, the sample...

Embodiment 3

[0185] Third Embodiment (3 assembled policy)

[0186] Natural environmental samples - sediment samples three cases, the amount of data. 6G single embodiment, mixing assembly reads the set of samples, better mixing assembly.

[0187] quantity Total length Average length N50 N90 The maximum length Minimum length GC Sample 1 177935 289272894 1625.72 2560 618 755937 500 62.61% Sample 2 177055 287062202 1621.32 2559 617 755937 500 62.63% Sample 3 174799 283115279 1619.66 2579 616 755937 500 62.64% Mix assembly 325995 578768707 1775.39 3070 646 1275998 500 62.52%

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an optimized method for analyzing microbial communities through metagenome binning. The method comprises the following steps of: filtering sequencing data to obtain high-quality sequencing data, selecting different assembly strategies to obtain contigs according to the source of a sample and the sequencing data volume, and analyzing gene data. Compared with the prior art, the method disclosed by the invention is used for simultaneously carrying out biological information analysis on the whole community and single-bacterium genome analysis independent of isolated cultureaiming at the microbial community. According to the invention, on the metagenome level, an efficient high-quality assembly algorithm which is more suitable for sample characteristics and sequencing data volume is provided, rich and comprehensive information analysis content is contained, and individuality and novelty are visualized; and the method realizes metagenome analysis qualitative change from communities to single bacteria, and the scheme comprises data correction capable of improving accuracy and comprehensive and perfect bin information summarization, is beneficial to screening valuable target bins more conveniently and efficiently, and further comprises systematically perfect mining ideas of subsequent analysis of the target bins.

Description

Technical field [0001] The present invention relates to high-throughput sequencing and sequencing analysis of biological data areas of information technology, especially relates to a method for optimizing the microbial community metagenomic analyzes binning. Background technique [0002] Diverse microbial communities, such as air-borne microbes, intestinal microorganisms, etc. Microorganism, with the popularity of high-throughput sequencing technology, for metagenomic sequencing microbial community diversity and functions along with the rise of the macro may genome sequencing obtaining information of all DNA species of microorganisms within a community. [0003] Macro genetic component box (binning) is a metagenomic sequenced mixing process sequences or different organisms obtained contigs assembled separately categorized by species. . After conventional single species are the complete genome sequence was a pure culture, then de novo genome sequencing was obtained, but there are ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B30/10G16B30/20
CPCG16B30/10G16B30/20
Inventor 夏昊强高川周煌凯艾鹏张秋雪
Owner GUANGZHOU GENE DENOVO BIOTECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products