Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Nuclear localization signal prediction algorithm based on dual-recommendation system of frequent mode and machine learning

A nuclear localization signal, frequent pattern technology, applied in the field of protein biology, can solve the problems of lack of prominence, difficult to mediate nuclear localization signal accuracy and recall, and high redundancy, to achieve improved probability, statistical significance and evolution. effect of meaning

Active Publication Date: 2019-04-16
SHANGHAI JIAO TONG UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example: PSORT II has high false positives, and the comparison is limited to cNLS; PredictNLS has high false negatives, making it difficult to find some new NLSs; the performance of the NLStradamus algorithm depends on the assumption that NLS has a certain residual distribution, but there are many NLSs Has very different residue frequencies; cNLS Mapper is limited to cNLS, and the NLS activity used comes from yeast, which may have certain limitations for screening other species; NucImport is also developed based on cNLS, and has limitations for other NLS; seqNLS The development basis is not based on cNLS, nor is it limited to species. It has a certain degree of advancement. Compared with other software, the performance is good, but not outstanding, especially based on the prediction of known NLS frequent words, it will ignore some special and Uncommon NLS
[0005] In addition, the biggest problem in the prediction of nuclear localization signals is the difficulty in adjusting the accuracy and recall. Due to the limited number of existing verified NLS and most of them are rich in basic amino acids, the NLS prediction algorithm based on machine learning tends to be more basic amino acids. High NLS
As long as there are fragments with more basic amino acids, it is easy to be regarded as NLS, resulting in high redundancy, and some other types of NLS are ignored, such as some NLS without basic amino acids

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Nuclear localization signal prediction algorithm based on dual-recommendation system of frequent mode and machine learning
  • Nuclear localization signal prediction algorithm based on dual-recommendation system of frequent mode and machine learning
  • Nuclear localization signal prediction algorithm based on dual-recommendation system of frequent mode and machine learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061] In order to make the technical means, creative features, goals and effects achieved by the present invention easy to understand, the present invention will be further described below in conjunction with specific embodiments.

[0062] see figure 1 , figure 2 and image 3 , the nuclear localization signal prediction algorithm based on frequent pattern and machine learning dual recommendation system of the present invention, comprises the following steps:

[0063] S1. Set up a nuclear localization signal training set and a non-nuclear localization signal training set as positive and negative samples respectively. The specific operation is as follows: select 145 NLSs with parental proteins, specific forms, and experimental verification from the NLSdb 2003 and 2017 databases as positive samples, and the negative samples correspond to the length of the positive samples one-to-one and come from the same sequence.

[0064] S2. Using a known word vector model to extract the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of protein biology, in particular to a nuclear localization signal prediction algorithm based on a dual-recommendation system of a frequent mode and machine learning. According to the algorithm, two kinds of models are proposed and comprise the model established based on an NLS prediction algorithm of the frequent mode and the model established by an NLS prediction algorithm of machine learning. A PrefixSpan algorithm idea is mainly used for the first model, so that some frequent base sequences which are gathered in a nuclear sequence database and are sparsein a non-nuclear sequence data are dug, the frequent base sequences are screened and evaluated, and a candidate NLS is obtained. The second model is mainly an SNM comprehensively with a word vector asa characteristic, a single protein sequence prediction NLS algorithm is based on linear classification of the statistics, imbalance grades and a PSSM matrix, and the hit rate and redundancy of the algorithm are improved. According to the algorithm, the NLS prediction precision is improved, and some special NLSs which are not limited by known NLSs can be better discovered.

Description

technical field [0001] The present invention relates to the field of protein biology, in particular to a nuclear localization signal prediction algorithm based on frequent pattern and machine learning double recommendation system. Background technique [0002] Nuclear localization signals are protein peptides bound to carrier proteins for the transport of nuclear proteins into the nucleus, which serve as important information for nuclear localization. Identification of nuclear localization sequences (NLS) can help elucidate protein function. However, experimental identification of such signals is expensive and only a limited number of nuclear localization sequences (NLS) have been identified so far. It is therefore important to develop algorithms for computational prediction of nuclear localization sequence NLS. [0003] There are already several NLS prediction methods, such as PSORT II, ​​PredictNLS, NLStradamus, cNLSMapper, NucImport and seqNLS, etc. PSORT II predicts N...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B30/00G16B35/00
Inventor 沈红斌郭芸
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products