Cancer subtype precise discovery and evolution analysis method based on data stream clustering

A technology of data flow clustering and analysis method, applied in the field of cancer subtype discovery and evolution analysis, can solve problems such as hindering the production of clustering results, and achieve the effect of high precision

Active Publication Date: 2017-10-27
ZHEJIANG UNIV OF TECH
View PDF7 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The SAMBA algorithm transforms the bi-clustering problem into a search problem of the maximum weight word graph in a bipartite graph based on a statistical model. It creates a new idea for the study of clustering tec...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cancer subtype precise discovery and evolution analysis method based on data stream clustering
  • Cancer subtype precise discovery and evolution analysis method based on data stream clustering
  • Cancer subtype precise discovery and evolution analysis method based on data stream clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The present invention will be further described below in conjunction with the accompanying drawings.

[0039] refer to figure 1 and figure 2 , a method for precise discovery and evolution analysis of cancer subtypes based on data stream clustering, including the following steps:

[0040] (a) Initialization of the gene expression data stream. Gene data flow data preprocessing operation: analyze the dimensional information of the data flow, and determine the calculation method of similarity distance; establish the grid unit of the genetic data flow object, and put the data into the grid by window to realize initialization; construct non-uniform Attenuation model, which determines the non-uniform attenuation parameters of the data flow in the online process and the update method of the grid density information.

[0041] (b) Online real-time clustering of gene expression data streams. In order to ensure real-time clustering requirements, each arriving data point is put...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a cancer subtype precise discovery and evolution analysis method based on data stream clustering. The method comprises the following steps of (a) initialization of gene expression data stream; (b) online real-time clustering of the gene expression data stream: putting each reachable data point into a corresponding grid cell; performing online grid maintenance; and when the specific time node is reached, deleting a sparse grid according to the grid density information; (c) offline precise clustering of the gene expression data stream: regarding the grid as a virtual data point with the density information; clustering the virtual data point by using a clustering method based on density-distance distribution; performing fast clustering division on other data points according to the density information of the determined clustering center points; and finally outputting a clustering result; and (d) class cluster evolution migration analysis. The invention provides the cancer subtype precise discovery and evolution analysis method based on data stream clustering with high precision.

Description

technical field [0001] The invention relates to a cancer subtype discovery and evolution analysis method based on data flow clustering. Background technique [0002] Identification of cancer subtypes plays an important role in revealing disease pathogenesis and facilitating personalized therapy. After decades of research, uncertainties remain in the clinical diagnosis of cancer and the identification of tumor-specific markers. Therefore, the study of efficient biological data mining methods has become an important direction and an urgent need for the development of bioinformatics. [0003] As an advanced data analysis and knowledge discovery technology, cluster analysis has been successfully applied in many fields. In the field of bioinformatics, this technology has also shown its great potential. Especially in gene expression data analysis, cluster analysis has been widely used and become one of the main technical means. Regardless of the clustering algorithm, it is fir...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/20G06F19/24G06F19/00
CPCG16B25/00G16B40/00
Inventor 陈晋音郑海斌林翔熊晖李南应时彦
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products