Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Xgboost-based whole-genome RNA secondary structure prediction method

A secondary structure, whole genome technology, applied in the field of bioinformatics research, can solve the problems of many output targets and affecting the accuracy rate

Inactive Publication Date: 2019-01-15
SUN YAT SEN UNIV
View PDF9 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But in fact, artificial threshold setting often leads to the problem of too many or too few output targets, which greatly affects the accuracy rate, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Xgboost-based whole-genome RNA secondary structure prediction method
  • Xgboost-based whole-genome RNA secondary structure prediction method
  • Xgboost-based whole-genome RNA secondary structure prediction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The present invention will be further described in conjunction with the accompanying drawings.

[0023] This method hopes to obtain the key features of the secondary structure formed by the primary sequence through supervised learning of the most basic sequence structure, and assist in judging how the secondary structure is formed from the primary sequence structure.

[0024] Such as figure 1 As shown, first obtain the dataset. The data set is derived from the results of biological experiments, and the possibility value of pairing between the RNA sequence and the base site in the RNA sequence is obtained from the results of the biological experiment. The datasets used include three, labeled as PARS-human, PARS-yeast, and PDB-Xray datasets. Specifically, the full name of PARS is "Parallel Analysis of RNA Structure (PARS)", which is an experimental method for determining the secondary structure of RNA through biological experiments proposed in 2010. PARS-Human refers to a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an Xgboost-based whole-genome RNA secondary structure prediction method. The method includes the following steps that: an RNA sequence and the probability values of the pairingof base sites in the RNA sequence are obtained; a base with a high probability of pairing, and bases of a certain length at the upstream and downstream of the bases with the high probability are combined to form sequence fragments, and the sequence fragments are adopted as positive samples; a base with a low probability of pairing, and bases of a certain length at the upstream and downstream of the bases with the low probability are combined to form sequence fragments, and the sequence fragments are adopted as negative samples; the positive samples and the negative samples are combined into sample data sets, and the sample data sets are divided into a training set and a test set, and the training set and the test set are loaded into a machine learning model established based on the Xgboostalgorithm, and the machine learning model is trained and tested; and the trained and tested machine learning model is used to predict an RNA secondary structure. With the method adopted, when the RNAforms the secondary structure, the probability scores of the paring of each base site are obtained; and with the probability scores adopted, a judgment basis can be provided for the formation of thesecondary structure in the next step.

Description

technical field [0001] The invention relates to the field of bioinformatics research, in particular to an Xgboost-based method for predicting the secondary structure of whole-genome RNA. Background technique [0002] The prediction of RNA secondary structure is an important research field of molecular biology, and it is of great significance to promote the development of life science. The molecular structure of RNA consists of three levels: primary structure, secondary structure, and tertiary structure. The RNA secondary structure refers to the stem-loop structure formed by the RNA sequence itself. It is a structure between the primary structure and the tertiary structure, and stores more high-level structural information. Therefore, the study of the RNA secondary structure has become a Important research questions in the field of bioinformatics. There are mainly two methods to determine the secondary structure: the experimental method of physical chemistry and the predict...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B40/00
Inventor 肖侬柯耀斌饶家华杨跃东陈志广卢宇彤
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products