Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method and system for analyzing metagenomic data

A data analysis system and metagenomics technology, which is applied in the field of metagenomics data analysis methods and systems, can solve the problems of many false positives, poor specificity, shortened analysis time, etc., and achieve the elimination of false positive results, reduced calculation, and control Calculate the effect of time

Active Publication Date: 2019-02-12
SIMCERE DIAGNOSTICS CO LTD +2
View PDF13 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005]1) Metagenome high-throughput detection has high sensitivity, but there are too many false positives in the test results, and the specificity is poor, which cannot meet the identification methods with high specificity requirements needs, e.g. clinical identification of pathogenic microorganisms
[0006]2) The existing metagenomic sequencing data analysis methods are still difficult to greatly accelerate the analysis speed and shorten the analysis time on the basis of ensuring the accuracy of identification results
[0007]3) The existing metagenomic data analysis platform has poor compatibility and cannot be generally applied to various sequencing scenarios
[0008]4) Existing metagenomic analysis technology cannot organically integrate species identification and functional gene analysis, and cannot provide more comprehensive and deeply processed information analysis results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for analyzing metagenomic data
  • A method and system for analyzing metagenomic data
  • A method and system for analyzing metagenomic data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0082] Example 1 Metagenome detection and data analysis of cardiac neoplasm samples based on Nanopore sequencing platform

[0083] The cardiac vegetation samples A1-A7 were collected from 7 cases of infective endocarditis patients with negative clinical culture for valve replacement surgery, and stored in a -80°C refrigerator.

[0084] Nucleic acid was extracted from the obtained samples according to the following procedure: take out the neoplastic sample from the refrigerator and place it at room temperature for 30 minutes, then use sterilized scissors to cut the neoplastic sample into pieces, and use the TIANamp Micro DNA kit to perform nucleic acid extraction according to the instructions.

[0085] The extracted nucleic acid samples were constructed and sequenced according to the following procedures. The library construction scheme selected the 1D Native barcoding protocol provided by Oxford Nanopore:

[0086] 1) Use g-TUBE (Covaris) to disrupt 1.2 μg nucleic acid sample at ...

Embodiment 2

[0107] Example 2 Metagenome detection and data analysis of cardiac neoplasm samples based on Ilumina sequencing platform

[0108] Using A1-A2 in Example 1 as samples, genomic nucleic acid was extracted and a library was constructed, and Illumina HiseqPE150 was used for sequencing. After removing adapters and sequences with a high N ratio, sequence information in fastq format was obtained from the obtained sequencing data. The data analysis of each sample was carried out as follows:

[0109] 1) The data in fastq format generated by Ilumina sequencing is removed from the adapters and sequences with a high N ratio, and then enters the next step of quality assessment analysis.

[0110] 2) Sequencing quality identification. The read length of this data is 150, and the sequences whose length is less than 100bp and whose average sequencing quality is less than 25 are filtered out. If the GC ratio of the first 10 bases of the data is abnormal, the first 10 bases of each sequence will...

Embodiment 3

[0123] Example 3 Drug-resistant gene detection of postcardiac neoplasm samples based on BGI sequencing platform

[0124] Taking A1-A2 in Example 1 as samples, extract genomic nucleic acid and construct a library, use the BGI sequencing platform for sequencing, and perform the following data analysis on the data generated by BGI sequencing for each sample:

[0125] 1) The data in fastq format generated by BGI sequencing is removed from the adapters and sequences with a high N ratio, and then enters the next step of quality assessment analysis.

[0126] 2) Sequencing quality identification. The read length of the data library was 150, and the sequences with a length of <100bp and an average sequencing quality of <25 were filtered out.

[0127] 3) Remove the host sequence. By aligning to the human genome (genome version HG38), the sequences that failed to align were retained and entered into the next step of analysis.

[0128] 4) The "two-step method" is used to identify the p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a metagenomic data analysis method and system. The data analysis method and system of the present invention obtains the preliminary species identification result of the sample based on the k-mer algorithm, and based on the preliminary species identification result, extracts part or all of the supporting sequences, uses the blast algorithm to verify the preliminary species identification result, and judges Whether the preliminary species identification result is a reported detected species. The method and system of the present invention can reduce false positives, quickly and accurately obtain the reported species of the sample in a short period of time, and are compatible with a variety of mainstream sequencing platforms, and are suitable for second-generation sequencing technology and third-generation sequencing technology; The method and system of the present invention can also accurately identify and map the drug-resistant genes and drug-resistant mutation sites of samples to report detection species. Furthermore, the system of the present invention can be used to identify pathogenic microorganisms, especially endocarditis pathogens, and overcome the defect that they are difficult to cultivate.

Description

technical field [0001] The present invention relates to the field of bioinformatics, in particular to a metagenomic data analysis method and system. Background technique [0002] Metagenome, also known as community genome, refers to the sum of the genetic material of all tiny organisms in a specific niche. Metagenomics (metagenomics) refers to the discipline that directly applies genomics technology to the study of microbial communities in niches without the need to isolate and cultivate a single strain. [0003] Unlike previous microbiological analysis methods, metagenomics analysis does not need to screen the cultures of each microbial community, but directly determines the nucleic acid sequences of all microorganisms in the sample to analyze the growth of the microbial community. Metagenomics analysis can avoid the bias caused by changes in microbial sequences due to environmental changes, and is especially suitable for identifying microorganisms that are difficult to cu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G16B30/10
CPCG16B30/00
Inventor 康悦胡欢程军周洲任用
Owner SIMCERE DIAGNOSTICS CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products