Data classification method based on improved local abnormal factor detection

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of local abnormal factor and data classification, applied in the field of data processing, it can solve problems such as failure to meet expected requirements, poor stability of clustering result accuracy, and failure to take into account the correlation of data within clusters, to achieve the effect of improving accuracy.

Pending Publication Date: 2019-08-02

GUIZHOU NORMAL UNIVERSITY

View PDF0 Cites 13 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the above improvements to the K-means algorithm do not take into account the correlation of data within the cluster, which often leads to poor stability of the accuracy of the clustering results and thus fails to meet the expected requirements.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

experiment example

[0086] Experimental example: prove the practicability of the inventive method, concrete steps are as follows:

[0087] Select six public data sets of Iris, Wine, Seeds, Wifi Localization, CMC, and Abalone in the UCI database, and test the results of K-means++, FCM, OFMMK-means, and optimized algorithms respectively. A detailed description of the datasets used is shown in Table 1.

[0088] Table 1 is the data set of the laboratory

[0089]

[0090] In the LOF algorithm, the parameter k_dist represents the number of detected neighborhood points. The larger the value is, the more sample points are selected, and the accuracy of clustering is more easily affected by the LOF value. This paper uses the above six data sets to do the following experiments on the value of the parameter k_dist, such as figure 1 shown.

[0091] Run the K-means++ algorithm, FCM algorithm, OFMMK-means algorithm, and the proposed optimization algorithm on the sample data sets Iris, Wine, Seeds, Wifi L...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a data classification method based on improved local abnormal factor detection. The method comprises the steps of outlier factor detection; similarity measurement; selection ofan initial clustering center point, and screening of data with relatively small outlier factors as a candidate set of the initial clustering center through a local outlier factor detection LOF algorithm for adaptively adjusting k distance parameters; and iterative optimization on the clustering center. In the iteration stage of the optimized clustering center, the outlier factor between the datais standardized by using the outlier standardization, so that the value range of the new outlier factor new _ ri is greater than or equal to 1. According to the invention, the accuracy of cluster center positioning and cluster division is improved.

Description

technical field [0001] The invention belongs to the technical field of data processing, and in particular relates to a data classification method based on improved detection of local abnormal factors. Background technique [0002] At present, the use of cluster analysis to realize data classification has become an indispensable technology in the field of data mining, and has broad application prospects in the fields of commerce, insurance, biology, and e-commerce. [0003] There are many kinds of clustering algorithms, including K-means algorithm based on distance division, FCM fuzzy clustering based on membership degree division, etc. Among them, the K-means algorithm has the advantages of simple thinking, easy implementation and fast clustering speed, but its clustering center is easily affected by outliers and abnormal points, which will cause the clustering to fall into local optimum. Therefore, the application and optimization of this algorithm in data classification h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/28G06K9/62

CPCG06F16/285G06F18/22

Inventor 游子毅

Owner GUIZHOU NORMAL UNIVERSITY

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Patsnap Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Patsnap Eureka Blog

Learn More

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Data classification method based on improved local abnormal factor detection

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

experiment example

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology