Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Transcript annotation method and method for screening long non-coding RNA and endogenous retrovirus source long non-coding RNA

A retrovirus, transcript technology, applied in the field of bioinformatics, can solve problems such as low throughput

Pending Publication Date: 2021-01-08
WENZHOU MEDICAL UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional 5' and 3' RACE (Rapid Amplification of cDNAEnds) is the best method to obtain complete transcripts, but this experimental method is low-throughput

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Transcript annotation method and method for screening long non-coding RNA and endogenous retrovirus source long non-coding RNA
  • Transcript annotation method and method for screening long non-coding RNA and endogenous retrovirus source long non-coding RNA
  • Transcript annotation method and method for screening long non-coding RNA and endogenous retrovirus source long non-coding RNA

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] Example 1. RSCS Annotated Transcripts

[0041] This example uses mouse embryonic fibroblasts (MEFs) as an example to describe the method for annotating transcripts described in the present invention.

[0042] 1. Obtain the off-machine data of RNA sequencing and small RNA sequencing of MEFs and iPSCs from the first day to the eighth day of reprogramming (reprogramming) MEFs and iPSCs during the process of cell reprogramming.

[0043] 1. Use trim_galore (0.4.5) or cutadapt (1.18) software to de-join the off-machine data of RNA-seq and small RNA-seq to obtain clean data.

[0044] 2. Then use FastQC (v0.11.5) software to perform data quality control on the clean data obtained in step 1. The screening conditions are: 1) The sequencing quality score of each base is not less than 20; 2) The GC content of each sequence It conforms to a normal distribution, and the deviation does not exceed 15%; 3) the N content in the sequencing result does not exceed 5%; 4) the sequencing len...

Embodiment 2

[0053] Example 2. Method for screening long non-coding RNA

[0054] This example describes the screening of long non-coding RNAs using the annotation method in Example 1. The specific method is as follows:

[0055] 1. To annotate transcripts, refer to Step 1 in Example 1 for specific methods.

[0056] 2. Use CPC2 and CNCI software to predict the coding ability of the transcripts obtained in splicing step 1 at each time point of cell reprogramming. The results are as follows Figure 5 As shown in A, 13,072 long non-coding RNAs were obtained, accounting for 22.19% of the total transcripts; as Figure 5 As shown in B, there are 10,361 known long non-coding RNAs, accounting for 79.26%; 2,711 novel long non-coding RNAs, accounting for 20.74%. R language was used to compare and analyze the expression levels and coding abilities of long non-coding RNAs and coding genes in mefs and iPSCs. The results are as follows: Figure 5 As shown in C and D, the expression level and length of ...

Embodiment 3

[0057] Example 3. Method for screening endogenous retrovirus-derived long non-coding RNA

[0058] This embodiment describes a method for screening endogenous retrovirus-derived long non-coding RNA using the annotation method in Example 1, and the specific method is as follows:

[0059] 1. Screen out the encoded long non-coding RNA, refer to Example 2 for the specific method.

[0060] 2. Then use the bedtools interact software to select the long non-coding RNA within 5kb from the long non-coding RNA obtained in step 1 according to the position on the chromosome as the long non-coding RNA derived from the endogenous retrovirus. Non-coding RNAs (ERV-lncRNAs), the results are as follows Image 6 As shown, 40.8% of long non-coding RNAs contained TE (transposable element) sequences, of which 59.3% were long non-coding RNAs associated with endogenous retroviruses.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a transcript annotation method and a method for screening long non-coding RNA and endogenous retrovirus source long non-coding RNA, belongs to the field of bioinformatics, and provides an annotation method of a transcript in order to provide an accurate and complete transcript and obtain long non-coding RNA with low expression quantity and a repetitive sequence source; RNA sequencing and small RNA sequencing data are combined with annotation transcript, complete and accurate transcript information is obtained, more accurate long-chain non-coding RNA annotation is provided, and expression information of long-chain non-coding RNA is accurately obtained; the method is applied to screening of long non-coding RNA and endogenous retrovirus source long non-coding RNA, screening to obtain 2,711 newly predicted long non-coding RNAs and screening of small non-coding RNAs, wherein the endogenous retrovirus source long non-coding RNA accounts for 59.3%.

Description

technical field [0001] The invention belongs to the field of bioinformatics, and specifically relates to a transcript annotation method and a method for screening long non-coding RNA and endogenous retrovirus-derived long non-coding RNA. Background technique [0002] The annotation of RNA transcripts mainly uses high-throughput RNA-seq (transcriptome sequencing technology) data, and a common problem it faces is that the precise boundaries of transcripts are difficult to define. Under ideal conditions, RNA-seq reads should have an unbiased coverage pattern across all expressed transcripts, but due to issues such as read length limitations, sample degradation, library construction methods, and base bias, RNA The coverage of -seq reads is biased, especially the deletion at the end of the transcript, which affects the integrity of the transcript annotation and brings bias to the identification of transcripts, quantification of expression levels, and further functional analysis. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B25/10G16B30/00G16B35/20G16B50/10G16B50/30
CPCG16B25/10G16B30/00G16B35/20G16B50/10G16B50/30
Inventor 孔庆然杜佳伟侯卫博丁春明
Owner WENZHOU MEDICAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products