Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

High-throughput sequencing quality control analysis method based on snakemake language fast batch and automatic email feedback results

A high-throughput, sequencing technology, used in sequence analysis, office automation, data visualization, etc., can solve the problems of no process monitoring mechanism, slow single-sample analysis, and inability to batch process, and achieve the effect of convenient error query.

Active Publication Date: 2022-07-26
SHANGHAI OE BIOTECH CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current general analysis method uses Trimmomatic to first filter low-quality sequences and sequencing adapters, and then uses fastqc to perform quality visualization analysis on the data, and can only perform single-sample processing, and conduct quality control processing for high-throughput sequencing data with large samples It may take several days or even a month, and the analysis results cannot be quickly fed back, and there is no process monitoring mechanism, making data analysis a major bottleneck in related research
[0005] The existing high-throughput sequencing quality control analysis process has the following defects: (1) single-sample analysis is slow: it takes a long time for a single sample to filter out the results from raw data to quality control; (2) samples cannot be processed in batches: only single-sample Quality control, multiple samples cannot be processed in parallel; (3) The feedback of the analysis results is not timely: manual verification is required after the process is completed, and email feedback cannot be timely; (4) No error detection mechanism: There is no detection mechanism for whether a single sample is successfully run; (5) ) No analysis process visualization: no intuitive visual display of the analysis process; (6) Incomplete result display: the analysis results are too simple, and there is a lack of visual display content corresponding to the data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-throughput sequencing quality control analysis method based on snakemake language fast batch and automatic email feedback results
  • High-throughput sequencing quality control analysis method based on snakemake language fast batch and automatic email feedback results
  • High-throughput sequencing quality control analysis method based on snakemake language fast batch and automatic email feedback results

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0059] Taking three samples of A1, A2 and A3 as examples, the process of the present invention will be described:

[0060] 1. Accept the raw data of A1, A2 and A3 samples from the user's high-throughput sequencing;

[0061] 2. Use fastp software to perform quality control filtering on each raw data of the above A1, A2, and A3 samples, see figure 2 , 3, 4;

[0062] figure 2 The distribution map of the average error rate of the sequence: the abscissa is the base positions at both ends of R1 and R2, and the ordinate is the average error rate at each base position;

[0063] image 3 It is a pie chart of sequence composition: the legend part includes the number and percentage of high-quality sequences, the number and percentage of low-quality sequences, the number and percentage of sequences containing too many N bases, the number of sequences that are too short and the percentage of all sequences. percentage;

[0064] Figure 4 is the base content distribution map: the abs...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a high-throughput sequencing quality control analysis method based on snakemake language, which can quickly batch and automatically feedback results by mail. The method specifically includes the following steps: file preparation; multi-sample parallel fastp quality control filtering; single-sample fastp operation monitoring ; Summary of fastp quality control results of all samples; summary of quality control results by email; multi-sample parallel fastqc detection; integration of all sample results; analysis method drawing. The analysis method of the invention can process samples in batches, obtain comprehensive results, and can automatically organize all analysis results, perform statistical summary and visualization, and at the same time, all operation steps can be traced to the source, which is convenient for error query.

Description

technical field [0001] The invention belongs to the technical field of high-throughput microbial sequencing, and relates to a high-throughput sequencing quality control analysis method based on snakemake language, which can quickly batch and automatically feedback results by mail. Background technique [0002] High-throughput sequencing, also known as "next-generation sequencing", is a change to traditional sequencing. Compared with traditional Sanger sequencing, the throughput of next-generation sequencing technology has increased by one to two orders of magnitude, which can economically perform high-throughput sequencing of genomes. Sequence coverage of magnification. With the gradual stabilization of the performance of high-throughput sequencing instruments and the continuous decline of prices, their applications are becoming more and more extensive. Therefore, research based on high-throughput sequencing data will show a rapid development trend in quantity and applicatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G16B30/00G16B40/00G16B45/00G16B50/00G06Q10/10
CPCG16B30/00G16B40/00G16B45/00G16B50/00G06Q10/107
Inventor 张建明顾胤聪肖云平史贤俊刘钰钏林博
Owner SHANGHAI OE BIOTECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products