Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Protein folding identification method based on triple loss

A technology of protein folding and identification method, which is applied to the analysis of two-dimensional or three-dimensional molecular structures, instruments, biological neural network models, etc., can solve the problems of "dimension disaster, indirect inefficiency, large dimension, etc., and achieve identification The effect of fast speed, faster recognition speed and higher accuracy

Active Publication Date: 2020-12-22
NANJING UNIV OF SCI & TECH
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The deep neural network method can automatically extract features from the input data, which greatly improves the accuracy of fold recognition, but the current related methods can only use a series of samples of known fold types to train the deep neural network, and then use the middle layer Feature degeneralization for proteins of unknown fold type
The main disadvantage of this method lies in its indirectness and inefficiency: you can only pray that the features of the middle layer can generalize new proteins well, and the feature dimension of the middle layer is very large, which will cause "dimension disaster".

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Protein folding identification method based on triple loss
  • Protein folding identification method based on triple loss
  • Protein folding identification method based on triple loss

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] Such as figure 2As shown, a protein folding recognition method based on the loss of triplets first uses one-hot coding to encode the protein to obtain the digital expression of the protein sequence, and then inputs it into the SSA program to obtain the The contact graph of , for better expression, the contact graph between protein residues and residues is named RRcontact, and then RRcontact is used as input data, input into the pre-trained deep learning framework, and the output of the network is It is a protein-specific folding recognition feature, named f; finally, compare the query protein f(query) with the template protein f(template) of the known protein folding category in the protein database, and the nearest template protein to the query protein Fold classes are assigned to query proteins.

[0029] The process is described in more detail below in conjunction with the accompanying drawings:

[0030] Step 1: Training data preprocessing: Use one-hot encoding to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a protein folding identification method based on triple loss, which comprises the following steps of encoding protein by using one-hot encoding, inputting the encoded protein into an SSA program to obtain a contact graph between protein residues, and using the contact graph as input data. inputting the input data into a pre-trained deep learning framework, wherein the output of the network is the characteristic that the protein is specific to folding identification, comparing characteristics of the query protein with template proteins of known protein folding categoriesin a protein database, and assigning the folding category of the template protein closest to the query protein to the query protein. According to the method, the training thought of triple loss is used for reference, so that protein structures of the same class are closer, protein structures of different classes are farther, feature expression of protein has higher discriminability, and the recognition efficiency is higher.

Description

technical field [0001] The invention belongs to the field of bioinformatics prediction of protein structure, in particular to a protein folding recognition method based on triplet loss. Background technique [0002] The determination of protein folding type can reveal the second set of genetic code of life, specifically how the primary structure of protein determines its spatial structure. As we all know, the three-dimensional structure of proteins plays a crucial role in the study of protein functions and properties, and the correct identification of protein folding is a key part of predicting the three-dimensional structure of proteins. In addition, because the protein folding method has a profound impact on protein heterogeneity and molecular function, it will have a huge role in promoting the artificial design of proteins in medicine, the search for fatal mechanisms, and the refolding of inclusion bodies. Therefore, fast and accurate identification of protein folding ty...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B15/20G06N3/04
CPCG16B15/20G06N3/045
Inventor 於东军刘岩
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products