Protein-DNA binding residue prediction method based on sampling and integrated learning

A prediction method and technology for binding residues, applied in proteomics, genomics, informatics, etc., can solve the problems of low prediction accuracy and information loss of the final model, enrich feature sources, prevent overfitting, reduce The effect of information loss

Inactive Publication Date: 2019-01-04
NANJING UNIV OF SCI & TECH
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to discarding unselected negative samples, random downsampling can easily cause information loss, resulting in low prediction accuracy of the final model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Protein-DNA binding residue prediction method based on sampling and integrated learning
  • Protein-DNA binding residue prediction method based on sampling and integrated learning
  • Protein-DNA binding residue prediction method based on sampling and integrated learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] The present invention will be further described below in conjunction with the accompanying drawings.

[0016]The accompanying drawing shows a schematic structural diagram of the prediction method system of the present invention. As shown in the accompanying drawings, according to an embodiment of the present invention, a method for predicting protein-DNA binding residues based on sampling and integrated learning includes the following steps: First, given a protein sequence set, use PSI-BLAST, PSIPRED, SANN and AAFD-BN algorithms extract the evolution information, predicted secondary structure information, predicted solvent accessibility information and amino acid frequency difference information of each protein sequence respectively; on this basis, combined with sliding window technology and serial The feature fusion technology represents the amino acid residues in the sequence in the form of feature vectors, and constructs a training sample set in units of residues. S...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a protein-DNA binding residue prediction method based on sampling and integrated learning. The method comprises the steps of (1) feature extraction and training sample set construction, (2) sampling and model training, (3) model integration, and (4) online prediction. The method is used for solving the shortcomings of low prediction precision caused by the problems of few feature types and class imbalance in protein-DNA binding residue prediction problems and has the advantages of high prediction precision and high generalization ability.

Description

technical field [0001] The invention relates to the field of bioinformatics prediction of protein-ligand binding residues, specifically, a protein with high precision and strong generalization ability based on a downsampling algorithm based on hyperplane distance and an improved self-adaptive lifting algorithm - DNA-binding residue prediction method. Background technique [0002] In cells, proteins often need to bind with DNA molecules to participate in various life activities, such as DNA replication, DNA repair and virus infection. Accurate identification of protein-DNA binding residues facilitates analysis of protein function and design of new drugs. Traditionally, researchers have utilized biochemical methods such as EMSAs, Fast ChIP, and X-ray crystallography to identify protein-DNA binding residues. However, such methods are time-consuming and expensive, and cannot meet the urgent needs of related research in the post-gene era where protein-DNA complexes are rapidly ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B20/00
Inventor 於东军朱一亨胡俊
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products