Multi-core-learning and Boosting algorithm based protein-DNA binding site prediction method

A technology of binding sites and multi-core learning, applied in the field of bioinformatics to predict protein-DNA interactions, can solve the problems of large gap between prediction accuracy and practical application, unbalanced samples with differences, poor interpretability, etc., to improve prediction Accuracy, clarity of how it works, effects that improve interpretability

Inactive Publication Date: 2016-07-27
NANJING UNIV OF SCI & TECH
View PDF5 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to solve the above sequence-based protein-DNA binding site prediction problem, due to the differences between different feature perspectives and the sample imbalance problem are not fully considered, the predic...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-core-learning and Boosting algorithm based protein-DNA binding site prediction method
  • Multi-core-learning and Boosting algorithm based protein-DNA binding site prediction method
  • Multi-core-learning and Boosting algorithm based protein-DNA binding site prediction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In order to better understand the technical content of the present invention, specific embodiments are given together with the attached drawings for description as follows.

[0024] Aspects of the invention are described in this disclosure with reference to the accompanying drawings, which show a number of illustrated embodiments. Embodiments of the present disclosure are not necessarily intended to include all aspects of the invention. It should be understood that the various concepts and embodiments described above, as well as those concepts and embodiments described in more detail below, can be implemented in any of a number of ways, which should be the concepts and embodiments disclosed by the present invention and not Not limited to any implementation. In addition, some aspects of the present disclosure may be used alone or in any suitable combination with other aspects of the present disclosure.

[0025] figure 1 A schematic diagram of the system structure of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a multi-core-learning and Boosting algorithm based protein-DNA binding site prediction method. The method comprises the following steps: feature extraction, extracting an evolutionary information feature vector and a solvent accessibility feature vector of each amino acid residue; feature fusion, using a linear kernel based multi-core-learning algorithm to carry out evaluation on weight information of the two feature vectors, and according to the weight, carrying out weighted serial combination to obtain a final sample feature vector; using a random downsampling technology to carry out multiple downsampling on non-binding site samples, combining a non-binding site sample subset obtained by downsampling and a binding site sample set to train an SVM, and obtaining a plurality of SVM prediction models; and using a Boosting lifting algorithm to carry out integration on the SVM models, and forming a final prediction model. The method disclosed by the present invention improves model interpretability, effectively reduces the size of the training set, and improves the prediction precision of the model.

Description

technical field [0001] The invention relates to the field of bioinformatics prediction of protein-DNA interaction, in particular to a protein-DNA binding site prediction method based on multi-core learning and Boosting algorithm. Background technique [0002] The interaction between protein and DNA is not uncommon in life activities, and it widely exists in a large number of living organisms. This interaction plays a vital role in life activities, for example, most of the activities such as DNA replication, DNA transcription and DNA expression in life activities require the cooperation between protein and DNA to be successfully completed. This interaction usually manifests as DNA binding to some fixed residues in proteins (namely, DNA binding sites), so that DNA can cooperate with proteins to complete a certain life activity. DNA binding sites in proteins are also often important targets for certain drugs. To thoroughly understand the process of life activities, especially...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/18
CPCG16B20/00
Inventor 於东军胡俊李阳沈红斌杨静宇
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products