Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Protein subnuclear localization method for feature extraction and fusion based on improved PSSM

A feature extraction and protein technology, applied in the fields of biology and information, can solve the problems of data set imbalance, low data set prediction accuracy, limitations, etc., to achieve the effect of enhancing complementarity and improving recognition rate

Active Publication Date: 2019-03-08
YUNNAN UNIV
View PDF4 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] To sum up, the technical problem in the prior art is that although these models provide more protein sequence information about amino acid interactions, they are still limited to a certain column or row, or a certain two columns or a certain interval with variable intervals. Effective discriminative information in two rows; the extracted features are too single to express the overall characteristics of the protein sequence
The extraction of effective features will affect the classification results of the classifier. The samples in proteomics data generally have the characteristics of high-dimensional features. How to effectively select features for the data, remove irrelevant features and alleviate the "dimension disaster" still exists It poses certain challenges; secondly, there is an imbalance problem in the data sets in proteomics, such as the Mutipass membrane protein data set, etc. The imbalance of the data set leads to low prediction accuracy for classes with a small number of samples, and the imbalance problem has become a protein A difficult point and key research content in omics

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Protein subnuclear localization method for feature extraction and fusion based on improved PSSM
  • Protein subnuclear localization method for feature extraction and fusion based on improved PSSM
  • Protein subnuclear localization method for feature extraction and fusion based on improved PSSM

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] Such as Figure 1-2 Shown

[0041] A protein subnuclear localization method based on improved PSSM for feature extraction and fusion, including the following steps:

[0042] Step 1: Obtain a protein data set, determine whether the acquired data set is a single-label or multi-label problem (the present invention is mainly aimed at single-label problems), and convert the data set into a standard .fata format, and determine the category of all samples Make an annotation.

[0043] In step 1, a threshold is set for the acquired data set according to the length of each piece of data (generally the length is greater than 50) for data screening.

[0044] Step 2: Set the iteration parameter to 3, set the E-value value of each protein comparison search to 0.001, and calculate the PSSM matrix of each data. Each protein is represented by P, where P=[P1,P2,...,P20], Pj=[P1j,P2j,...PLj] (j=1, 2,...20), and L represents each The length of the protein.

[0045] Step 3: Convert the position s...

Embodiment 2

[0064] The present invention is based on the published apoptotic protein data set ZD98 for experimental verification. Among them, ZD98 was established by Zhou and Doctor in 2003. The data set contains four subcellular apoptotic protein sequences, namely cytoplasmic proteins (CY), plasma membrane-bound proteins (ME), mitochondrial proteins (MI) and other proteins (OTHER ). OA in Table 1 represents the overall correct recognition rate. The results in Table 1 strictly follow the feature extraction methods and fusion strategies mentioned above for feature fusion. In terms of feature selection, only traditional linear discriminant analysis algorithms are currently used for dimensionality reduction, and the results are better than traditional feature extraction methods. It can be seen from Table 1 that the algorithm in this paper is more effective than other algorithms in these evaluation objective indicators.

[0065] Table 1 Fusion results based on different fusion methods

[0066] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a protein subnuclear localization method for feature extraction and fusion based on an improved PSSM, and relates to the field of biological and information technologies. The method comprises the steps: firstly, denormalizing a position-specific scoring matrix for the evolution information of a protein sequence through using a Z-SoftMax function; secondly, respectively performing the feature extraction of the position-specific scoring matrix in different directions at different hopping intervals through using a proposed SC-PSSM-C and SC-PSSM-R, and fixing the length ofthe PSSM; thirdly selecting the features of the fused features through an improved maximum information coefficient algorithm; finally, performing the final classification prediction through a W-SVM classifier with optimized parameters. The method has the function of overcoming limitations and unicity of the conventional feature extraction, and improves the capability of protein subnuclear localization.

Description

Technical field [0001] The present invention relates to the field of biology and information technology, in particular to a protein subnuclear localization method based on improved PSSM for feature extraction and fusion. Background technique [0002] With the popularization and improvement of human genome sequencing technology, a large number of protein sequences have been produced. In the last 20 years, the mastery of protein functions of newly measured sequences has become one of the hot spots in bioinformatics research. The function of a protein depends on its location in the cell, and determining the subcellular location of a protein is considered an important step in understanding its function. Protein subnuclear localization information can provide important clues for disease prevention, diagnosis and treatment. Traditionally, it takes a lot of time and money to obtain protein subnuclear localization information through a large number of repeated biological experiments. I...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B20/00G06K9/62G06K9/46
CPCG06V10/40G06F18/214G06F18/241
Inventor 聂仁灿阮小利周冬明贺康建李华光
Owner YUNNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products