Mixed feature data clustering method and system based on tree base learner

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of mixed features and data clustering, applied in machine learning, instrumentation, character and pattern recognition, etc., to achieve the effect of improving the quality of clustering

Pending Publication Date: 2022-02-01

浙江浙石油综合能源销售有限公司 +1

View PDF1 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

This patent uses Euclidean distance for clustering, which cannot accurately cluster high-latitude mixed data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0055] This embodiment explains the specific operation steps and verifies the effect of the solution of the present invention based on the vehicle data collected by the energy supply station. The number and type of features of the vehicle data set are shown in Table 1 below:

[0056]

[0057] Table 1 Vehicle Data Sheet

[0058] refer to figure 1 , the present embodiment provides a method for clustering mixed feature data based on a tree-based learner, comprising steps:

[0059] S1. Perform random sub-sampling on the sample set to generate N different sub-sample sets;

[0060] S2. Perform tree-based learner training on each sub-sample set, and obtain N trees and the number of clusters K after the training is completed;

[0061] S3. Based on the N trees after the training is completed, count the similarity matrices between any two samples, and normalize all the similarity matrices to obtain multiple normalized similarity matrices;

[0062] S4. The number K of clusters and ...

Embodiment 2

[0093] refer to Figure 7 , the present embodiment provides a hybrid feature data clustering system based on a dendritic base learner, including sequentially connected sub-sample set generation modules, a dendritic base learning module, a similarity matrix module, a clustering module, and a clustering module Also joins with tree-based learning modules;

[0094] A subsample set generating module, configured to perform random subsampling on the sample set to generate N different subsample sets;

[0095] The tree-based learning module is used to train the tree-based learner for each sub-sample set, and obtain N trees and the number of clusters K after the training is completed;

[0096] The similarity matrix module is used to count the similarity matrix between any two samples based on the N trees after the training is completed, and normalize all the similarity matrices to obtain multiple normalized similarity matrices;

[0097] The clustering module is used to use the number ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of mixed feature data set clustering, and discloses a mixed feature data clustering method and system based on a tree base learner, and the method comprises the steps: S1, carrying out the random sub-sampling of a sample set, and generating N different sub-sample sets; s2, performing tree-based learning device training on each sub-sample set, and obtaining N trees after training and the number K of clusters; s3, counting a similarity matrix between any two samples based on the N trained trees, and normalizing all the similarity matrixes to obtain a plurality of normalized similarity matrixes; and S4, taking the number K of the clusters and the plurality of normalized similarity matrixes as input of a spectral clustering model to obtain a final clustering result of the sample set. The data clustering method under the conditions of high latitude and mixed features is designed, and the problem that clustering is difficult due to the fact that a similarity concept cannot be clearly defined under the conditions that the data set dimension is too high and continuous features and discrete features are mixed can be solved.

Description

technical field [0001] The invention belongs to the technical field of clustering of mixed feature data sets, and in particular relates to a method and system for clustering mixed feature data based on a tree-based learner. Background technique [0002] For the vehicle data set of the energy supply station, the high dimensionality of data features and the mixture of continuous and discrete features pose challenges to traditional clustering algorithms, especially clustering algorithms based on Euclidean distance. In the case of the "curse of dimensionality", all samples will be approximately equidistant and adjacent, so that the nearest neighbor problem may become meaningless. Second, many algorithms that rely on traditional distance measures are sensitive to attributes of different units. Although data transformation can be used to alleviate this problem, it may change the distribution of data and affect the clustering results. Moreover, in the case of large data, an excess...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/62G06N20/00

CPCG06N20/00G06F18/2323G06F18/22

Inventor 范庆来倪勇龙陈义周君良钱至远朱霄蒋肇标郭庆

Owner 浙江浙石油综合能源销售有限公司

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Mixed feature data clustering method and system based on tree base learner

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology