Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Gene selection and cancer classification methods based on Monte Carlo and nonnegative matrix factorization

A factorization, non-negative matrix technology, applied in the field of stoichiometry, which can solve the problems of missing a lot of information and losing important information of the original gene data.

Inactive Publication Date: 2017-07-25
NORTHWEST NORMAL UNIVERSITY +1
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the sparse basis vectors of the genetic data array obtained through sparseness will lose a lot of important information of the original genetic data, and the greater the sparsity, the more information is lost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Gene selection and cancer classification methods based on Monte Carlo and nonnegative matrix factorization
  • Gene selection and cancer classification methods based on Monte Carlo and nonnegative matrix factorization
  • Gene selection and cancer classification methods based on Monte Carlo and nonnegative matrix factorization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] The present invention will be described in detail below in combination with specific embodiments.

[0015] The non-negative matrix factorization method combines multivariate m x n data V decomposed into two non-negative W data and H data, namely:

[0016]

[0017] (1) In the formula, the rank of the matrix r is less than or equal to m with n The positive integer of is generally taken as a matrix V rank. H Take it as the basis matrix, then W is the coefficient matrix. The principle of multiplication is as follows:

[0018]

[0019]

[0020] When the above iterative process continues, the distance keeps decreasing, Represents the Frobenius norm (F-norm). The iterative process continues until certain convergence criteria are met, e.g., the distance There are only small changes before and after a certain iteration. After convergence is reached, the vectors in the basis matrix tend to be sparse. Important genes can be found through sparse basis m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A gene selection and cancer classification method based on Monte Carlo and non-negative matrix factorization, using Monte Carlo method to generate multiple gene subsets from the gene expression data of the original sample; each subset is passed through the non-negative matrix factorization method It is decomposed into a coefficient matrix and a base matrix; in each non-negative matrix factorization iteration, if the sparsity of a certain sample in the base matrix is ​​less than the minimum sparsity of the original sample, the elements in the sample are gradually replaced by zeros from small to large, until the The sparsity is no longer less than the minimum sparsity of the original sample; judge convergence; after iterative convergence, the gene score is used to judge the importance of the gene: arrange the genes in descending order according to the score, select the genes in the sequence one by one to build a series of models, and each model passes Ten-fold cross-validation method correction; the model with the best accuracy is used for prediction. The method can effectively identify biomarkers in genes, and the model established by the identified biomarkers can be used to effectively predict phenotypes in new cancers.

Description

technical field [0001] The invention belongs to the technical field of chemometrics, and relates to a gene selection and cancer classification method based on Monte Carlo and non-negative matrix factorization. Background technique [0002] Cancer classification is a key issue in clinical research to identify biomarkers and cure malignancies. Gene expression profiles obtained by microarray gene chip technology have been successfully applied to identify biomarkers and classify cancer samples. [0003] Gene expression profiles reflect biological information through a large amount of genetic data. All genetic data in gene expression profiles can be considered as potential biomarkers. Some important biomarkers in genetic data and features of genetic data can be used to accurately predict the phenotype of new tumors. However, if all the data is used, classification will run into problems with high-dimensional data. [0004] Nonnegative matrix factorization (NMF) can generate n...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F19/18
Inventor 陈晶张苗邵学广
Owner NORTHWEST NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products