Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

An optimized metagenomic binning method for analyzing microbial communities

A technology of metagenomics and microbiota, which is applied in the field of high-throughput sequencing technology and bioinformatics analysis of sequencing data. It can solve problems that affect the accuracy of bins, multi-running resources, and single annotation strategies, so as to achieve accurate and reliable bin results and improve Data correction, effect of improving research depth

Active Publication Date: 2022-03-29
GUANGZHOU GENE DENOVO BIOTECH
View PDF18 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] 1) Single assembly strategy
The sample type (such as natural environment samples, symbiotic bacteria samples), the amount of sample sequencing data (such as 6G, 10G, 20G, 100G, etc.), the number of biological repeats, etc. were not fully considered, and the default assembly parameters of the software were directly used without testing and evaluation , leading to disadvantages such as non-targeted assembly, low computing efficiency, and taking up more operating resources;
[0012] 2) The annotation adopts the default parameters of the software, without comparison and optimization, and the annotation strategy is single
Although some patents already have a binning analysis process based on metagenomics, there is no time-consuming and resource-consuming data correction part, which will directly affect the accuracy of bin clustering and bin identification;
[0016] 3) It is difficult to provide detailed bin gene annotations
[0017] 3) The reference value of bin species annotation results is extremely low
The default output of the existing process is the genus-level species or NA (unknown genus level) of each species. On the one hand, only the genus level is displayed, which is too abstract, and the user does not have an intuitive understanding and cannot obtain effective reference information. The database checks annotations at other classification levels, and generally one sample may obtain hundreds of high-quality bins, so the workload is relatively large; A single bacterial genome can be used as a specific species, but the existing process does not provide species-level annotations, which may prevent customers from efficiently identifying new species of the same genus and different species; in addition, the genus level is unknown, which does not mean that the higher-level taxonomic level annotations are unknown. For many particularly new species, it is true that they can only be annotated to the order, family and other classification levels, so this part of the bin annotated as NA cannot be scientifically evaluated and judged due to lack of information, and may even miss the possibility of discovering the target new species; And with the bioinformatics analysis level of mainstream customers, it is impossible to re-analyze and identify all NA bins;
[0018] 4) Lack of visual form
Only the simplest table output results are provided, without considering the user's efficient information extraction and summary requirements after obtaining the bin
[0019] 5) The existing process analysis lacks the follow-up data mining of bin
At present, after binning analysis, the bin is regarded as a potential genome and the analysis is interrupted, or only the genome sequence after the bin reassembly is provided, and the data mining ideas and results based on the target bin are not provided.
The binning analysis is only an intermediary, binning is only the starting point for subsequent analysis, and the real data mining has not yet begun
Since bin analysis spans from the total DNA of the community to the "genome", it jumps into a completely different analysis field, so it is very unfriendly to users' in-depth analysis in the later stage, and the requirements for users' biological foundation are too high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An optimized metagenomic binning method for analyzing microbial communities
  • An optimized metagenomic binning method for analyzing microbial communities
  • An optimized metagenomic binning method for analyzing microbial communities

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0172] Example 1 (Assembly Strategy 2)

[0173] Non-natural environment samples, the assembly of 4 samples of different human intestinal tracts (total data volume is about 6Gb), and the assembly effect of the current technology for intestinal samples is better in the literature Han M, Yang P, Zhong C, et al .The Human GutVirome in Hypertension[J].Frontiers in Microbiology,2018,9:In the analysis part of Assembly of the Human Gut Metagenomic Data, the N50 in the literature is 4152bp, and the kmer assembly parameters used in this example and references are 27, 37, 47 , 57, 67, 77, 87, 97, 107, 117, 127, the N50 assembled by this embodiment is higher than that of the prior art, and the assembly effect is better.

[0174] quantity total length average length N50 N90 The maximum length minimum length GC sample 1 27273 87611145 3212.38 4785 1263 371303 1000 48.54% sample 2 26592 88292244 3320.26 4967 1329 124147 1000 57.51% sampl...

Embodiment 2

[0176] Embodiment two (assembly strategy 4):

[0177] Compared with prior art literature (Zhang M, Pan L, Huang F, et al. Metagenomic analysis of composition, function and cycling processes of microbial community in water, sediment and effluent of Litopenaeus vannamei farming environments under different culture modes[J]. Aquaculture, 2019 , 506:280-293.) in 3 examples of natural environment samples, i.e. water body samples (GW, HE, HW) data, with the technical process of the present invention, existing parameters, and literature data comparison, the results are as follows:

[0178] (1) GW water body, the total length recorded in the literature is 288496, and the N50 is 954. The sample data is compared with the existing technical parameters and the technology of the present invention. The result data is as follows. The N50 of the technology of the present invention is 1027, and the assembly effect is stronger than Existing technical parameters.

[0179]

[0180]

[0181...

Embodiment 3

[0185] Embodiment three (assembly strategy 3)

[0186] There are 3 cases of natural environment samples-sediment samples, the data volume of a single case is 6G, and the sample reads in the group are mixed and assembled, and the mixed assembly effect is better.

[0187] quantity total length average length N50 N90 The maximum length minimum length GC sample 1 177935 289272894 1625.72 2560 618 755937 500 62.61% sample 2 177055 287062202 1621.32 2559 617 755937 500 62.63% sample 3 174799 283115279 1619.66 2579 616 755937 500 62.64% mixed assembly 325995 578768707 1775.39 3070 646 1275998 500 62.52%

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention discloses an optimized metagenomic binning method for analyzing microbial communities, including filtering sequencing data to obtain high-quality sequencing data, and then selecting different assembly strategies to obtain contigs according to the source of the sample and the amount of sequencing data, Genetic data analysis was then performed. Compared with the prior art, the present invention is a kind of biological information analysis of the whole community and "single bacterium" genome analysis that does not rely on isolation and culture simultaneously for the microbial community. At the level of metagenomics, it provides an efficient and high-quality assembly algorithm that is more suitable for sample characteristics and sequencing data volume, and contains rich and comprehensive information analysis content, and personalized and novel visualization. Realized the qualitative change of metagenomic analysis from community to single bacteria. The scheme includes data correction that can improve accuracy, and comprehensive bin information summary, which is conducive to more convenient and efficient screening of valuable target bins. It also includes a comprehensive system Mining ideas for subsequent analysis of target bins.

Description

technical field [0001] The invention relates to the field of high-throughput sequencing technology and bioinformatics analysis of sequencing data, in particular to an optimized metagenomic binning method for analyzing microbial communities. Background technique [0002] There are many kinds of microorganisms in the community, such as air microorganisms, intestinal microorganisms, soil microorganisms, etc. With the popularization of high-throughput sequencing technology, metagenomics for sequencing the diversity and functions of community microorganisms is also on the rise. Metagenome sequencing can Obtain species DNA information of all microorganisms in a community. [0003] Metagenome binning is the process of classifying contigs obtained from metagenomic sequencing mixed with sequences from different organisms or assembled by species separately. . The traditional whole genome sequence of a single species is obtained after pure culture and de novo sequencing of the whole ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G16B30/10G16B30/20
CPCG16B30/10G16B30/20
Inventor 夏昊强高川周煌凯艾鹏张秋雪
Owner GUANGZHOU GENE DENOVO BIOTECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products