Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for detecting variable spliceosome in third generation full-length transcriptome

A detection method and cutting body technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of redundant gene annotation results, wrong positioning of cutting sites, and lack of them, so as to improve the credibility Accurate, perfect annotation, high reliability effect

Active Publication Date: 2016-03-09
嘉兴菲沙基因信息有限公司
View PDF6 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] The existing transcript and genome comparison software GMAP (GMAP: a sequence comparison software) can directly output the gene model gff file of flnc (flnc: non-chimeric full-length transcript) (gff: a pair The data format for describing base sequence characteristics), but the comparison result is the comparison information for each sequence. If this result is directly used as the gene annotation result, there will be too many false positives and repetitions: 1) The overall coverage and The alignment rate cannot guarantee the accuracy of the cut site, and most of the errors in the transcripts obtained by the third-generation sequencing are deletion insertions (indels), and the indels near the exon boundary can easily cause errors in the positioning of the cut site ; 2) Due to the multiple expression of genes, many sequences will correspond to the same gene model, and there is a lot of redundancy in the gene annotation results
[0003] The software Cufflinks, which is compared and merged with the annotation results of the reference genome, can compare the two sets of annotation results, and can find out the reference annotations relative to the reference sequence. After comparing the genes that have been annotated in the reference sequence, it is found to be a new gene), contained (indicates that it is included in the annotated gene compared with the annotated gene in the reference sequence, but the sequence length is shorter than the annotated gene), etc. or isoform (alternative splicing body), contains the gene structure with 5' or 3' deletion relative to the reference, because the Isoseq (three-generation transcriptome sequencing process is called Isoseq) experimental process can ensure the integrity of 3', so 3 'The lack of exon corresponds to a new isoform (alternative splicing body), and the deletion of the 5' exon is likely to be caused by the decomposition during the experiment, so there are both novel and incomplete in the contained part long, and cuffdiff itself doesn't differentiate it

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for detecting variable spliceosome in third generation full-length transcriptome
  • Method for detecting variable spliceosome in third generation full-length transcriptome

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] Example 1. A method for detecting alternatively spliced ​​bodies in the three-generation full-length transcriptome. The following combination figure 1 with figure 2 This embodiment will be described.

[0045] see figure 1 , S1. Using the SMRT process to dejoin and merge the original circular test sequences to form a single-molecule transcript sequence, and screen three generations of full-length transcript sequences from the single-molecule transcript sequence.

[0046] Specifically, using the SMTR_AnalysisIsoSeq process, the original circular sequencing sequence was de-jointed, and the de-joined sequencing sequences were combined to form a high-quality single-molecule transcript sequence, and three generations of full-length transcripts were screened from the single-molecule transcript sequence. Long transcript sequences.

[0047] S2. Using the next-generation sequencing data to correct errors in the screened third-generation full-length transcript sequences.

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for detecting a variable spliceosome in a third generation full-length transcriptome. The method comprises the following steps: merging original annular test sequences with joints removed to form a monomolecular transcript sequence, and screening a third generation full-length transcript sequence; comparing the third generation full-length transcript sequence with a reference genome sequence, and screening a third generation full-length transcript sequence having coverage and similarity with the reference genome sequence larger than preset thresholds; carrying out splicing false positive filtration and DNA contamination filtration on the screened third generation full-length transcript sequence; and carrying out gene annotation and variable spliceosome annotation on the filtered third generation full-length transcript sequence. An overlong read length of a third generation sequencing technology mentioned in the method disclosed by the invention is large enough to cover most RNA, the third generation full-length transcript sequence can be obtained by SMRT sequencing transcriptomes without being assembled, and a splicing structure of a gene can be effectively obtained by third generation transcriptome sequencing, and more perfect gene model annotation can be constructed.

Description

technical field [0001] The invention relates to the technical field of gene detection, in particular to a method for detecting alternatively spliced ​​bodies in three-generation full-length transcriptomes. Background technique [0002] The existing transcript and genome comparison software GMAP (GMAP: a sequence comparison software) can directly output the gene model gff file of flnc (flnc: non-chimeric full-length transcript) (gff: a pair The data format for describing base sequence characteristics), but the comparison result is the comparison information for each sequence. If this result is directly used as the gene annotation result, there will be too many false positives and repetitions: 1) The overall coverage and The alignment rate cannot guarantee the accuracy of the cut site, and most of the errors in the transcripts obtained by the third-generation sequencing are deletion insertions (indels), and the indels near the exon boundary can easily cause errors in the posit...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/22
CPCG16B30/00
Inventor 刘红芳
Owner 嘉兴菲沙基因信息有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products