Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Protein secondary structure engineering prediction method based on large margin nearest central point

A secondary structure and prediction method technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as low prediction efficiency, local minimum value of data weights, etc., and achieve fast and efficient prediction effects

Inactive Publication Date: 2010-08-04
HARBIN INST OF TECH
View PDF2 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to provide a method based on the nearest central point with a large interval to solve the problems that existing data weights have local minimum values ​​and low prediction efficiency when machine learning algorithms are used in the existing protein secondary structure prediction methods. Engineering Prediction Method of Protein Secondary Structure

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Protein secondary structure engineering prediction method based on large margin nearest central point
  • Protein secondary structure engineering prediction method based on large margin nearest central point
  • Protein secondary structure engineering prediction method based on large margin nearest central point

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0018] Specific implementation mode one: the following combination figure 1 , figure 2 This embodiment will be specifically described. An engineering prediction method of protein secondary structure based on the nearest central point of large interval, which is realized by the following steps:

[0019] Step 1. Download the published NCBI nr database and protein structure data in PDB format, and construct a non-redundant protein secondary structure training data set based on the protein structure data in PDB format;

[0020] Step 2, given the primary sequence data of the target protein, constructing a multiple sequence alignment feature vector for each residue in the primary sequence of the target protein according to the NCBI nr database provided in step 1;

[0021] Step 3. Based on the multiple sequence alignment eigenvector of the target protein sequence constructed in step 2, call the large interval nearest center point algorithm to obtain the secondary structure predict...

specific Embodiment approach 2

[0074] Specific embodiment 2: This embodiment is a further description of the engineering prediction method of protein secondary structure based on the nearest central point of the large interval described in the specific embodiment 1. The initial hyperparameters described in step 3.3 The value range of μ is 0, 0.1, 1, 5, 10 or 20, and the optimal value of the hyperparameter μ within the range is quickly determined by using the RS126 non-redundant data set.

[0075] Since the PDB data training set derived from the PDB database described in step 1 contains quite a lot of protein chains, the subgradient projection algorithm of the PDB data training set takes a long time to converge. Therefore, the RS126 non-redundant data set is used to quickly determine the hyperparameter μ, and the hyperparameter μ described in this embodiment is used to regularize the linear transformation matrix. Selecting an appropriate hyperparameter μ can prevent over-learning and prevent the learned mode...

specific Embodiment approach 3

[0076] Specific implementation mode three: the following combination image 3 This embodiment will be specifically described. This embodiment is a further limitation of the method for engineering prediction of protein secondary structure based on the nearest central point of a large interval described in the first embodiment. In step 1, constructing a non-redundant protein secondary structure training data set is Achieved by following steps:

[0077] Step 1.1. Based on the protein structure data in PDB format determined by X-ray crystal diffraction released in the PDB database, apply the DSSP program to convert the protein structure data in PDB format into a data file in DSSP format;

[0078] Step 1.2: Convert the data file in DSSP format into a protein sequence data file in FASTA format based on the definition of DSSP format. At the same time, the 8 secondary structures defined by DSSP are classified into 3 types, among which, the H conformation, G conformation, and I confo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a protein secondary structure engineering prediction method based on large margin nearest central point, belonging to the protein secondary structure engineering prediction method field and solving the problems that the existing protein secondary structure prediction method has local minimum of data weight number and low prediction efficiency when adopting machine learning algorithm. The protein secondary prediction method of the invention includes that: firstly a non-redundant protein secondary structure training data set is constructed based on PDB database, then a multi-sequence comparison characteristic is constructed for a target protein chain based on NCBT nr database, and finally the large margin nearest central point algorithm is utilized to build a protein secondary structure prediction model. The large margin nearest central point algorithm utilizes Euclidean distance K-means clustering algorithm to determine the central point of each sample, and linear transformation of input space is learned by a minimization target loss function. The invention realizes fast, high-efficiency and high-precision protein secondary structure prediction and is applicable to protein secondary structure prediction.

Description

technical field [0001] The present invention relates to a method for engineering prediction of protein secondary structure based on machine learning, in particular to a method combining the large-interval closest center point method with multiple sequence alignment features, which belongs to the engineering prediction of protein secondary structure method field. Background technique [0002] With the publication of the human genome map and the completion of more and more complete genome sequences of animals, plants, and microorganisms, biological science has entered the "post-genome era". Human beings will study and understand the mysteries of life on the basis of understanding the entire sequence of genetic material DNA, and elucidating the functions of gene-encoded products (proteins) has become the main research goal. A series of studies have shown that the ability of a protein to perform its specific biological functions is determined by its specific structure. Therefo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/00G06F17/30
Inventor 王宽全杨伟左旺孟袁永峰张宏志
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products