Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Mixed feature data clustering method and system based on tree base learner

A technology of mixed features and data clustering, applied in machine learning, instrumentation, character and pattern recognition, etc., to achieve the effect of improving the quality of clustering

Pending Publication Date: 2022-02-01
浙江浙石油综合能源销售有限公司 +1
View PDF1 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This patent uses Euclidean distance for clustering, which cannot accurately cluster high-latitude mixed data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mixed feature data clustering method and system based on tree base learner
  • Mixed feature data clustering method and system based on tree base learner
  • Mixed feature data clustering method and system based on tree base learner

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] This embodiment explains the specific operation steps and verifies the effect of the solution of the present invention based on the vehicle data collected by the energy supply station. The number and type of features of the vehicle data set are shown in Table 1 below:

[0056]

[0057] Table 1 Vehicle Data Sheet

[0058] refer to figure 1 , the present embodiment provides a method for clustering mixed feature data based on a tree-based learner, comprising steps:

[0059] S1. Perform random sub-sampling on the sample set to generate N different sub-sample sets;

[0060] S2. Perform tree-based learner training on each sub-sample set, and obtain N trees and the number of clusters K after the training is completed;

[0061] S3. Based on the N trees after the training is completed, count the similarity matrices between any two samples, and normalize all the similarity matrices to obtain multiple normalized similarity matrices;

[0062] S4. The number K of clusters and ...

Embodiment 2

[0093] refer to Figure 7 , the present embodiment provides a hybrid feature data clustering system based on a dendritic base learner, including sequentially connected sub-sample set generation modules, a dendritic base learning module, a similarity matrix module, a clustering module, and a clustering module Also joins with tree-based learning modules;

[0094] A subsample set generating module, configured to perform random subsampling on the sample set to generate N different subsample sets;

[0095] The tree-based learning module is used to train the tree-based learner for each sub-sample set, and obtain N trees and the number of clusters K after the training is completed;

[0096] The similarity matrix module is used to count the similarity matrix between any two samples based on the N trees after the training is completed, and normalize all the similarity matrices to obtain multiple normalized similarity matrices;

[0097] The clustering module is used to use the number ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of mixed feature data set clustering, and discloses a mixed feature data clustering method and system based on a tree base learner, and the method comprises the steps: S1, carrying out the random sub-sampling of a sample set, and generating N different sub-sample sets; s2, performing tree-based learning device training on each sub-sample set, and obtaining N trees after training and the number K of clusters; s3, counting a similarity matrix between any two samples based on the N trained trees, and normalizing all the similarity matrixes to obtain a plurality of normalized similarity matrixes; and S4, taking the number K of the clusters and the plurality of normalized similarity matrixes as input of a spectral clustering model to obtain a final clustering result of the sample set. The data clustering method under the conditions of high latitude and mixed features is designed, and the problem that clustering is difficult due to the fact that a similarity concept cannot be clearly defined under the conditions that the data set dimension is too high and continuous features and discrete features are mixed can be solved.

Description

technical field [0001] The invention belongs to the technical field of clustering of mixed feature data sets, and in particular relates to a method and system for clustering mixed feature data based on a tree-based learner. Background technique [0002] For the vehicle data set of the energy supply station, the high dimensionality of data features and the mixture of continuous and discrete features pose challenges to traditional clustering algorithms, especially clustering algorithms based on Euclidean distance. In the case of the "curse of dimensionality", all samples will be approximately equidistant and adjacent, so that the nearest neighbor problem may become meaningless. Second, many algorithms that rely on traditional distance measures are sensitive to attributes of different units. Although data transformation can be used to alleviate this problem, it may change the distribution of data and affect the clustering results. Moreover, in the case of large data, an excess...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N20/00
CPCG06N20/00G06F18/2323G06F18/22
Inventor 范庆来倪勇龙陈义周君良钱至远朱霄蒋肇标郭庆
Owner 浙江浙石油综合能源销售有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products