Noise data removal method and implementation system of improved k-means algorithm

A noise data and data technology, which is applied in the field of improved k-means algorithm noise data removal method and implementation system, can solve the problems that the clustering results cannot be guaranteed and the accuracy of the clustering results is dependent, and achieve high reliability and accuracy High and stable effect

Active Publication Date: 2019-02-26
ZHEJIANG SCI-TECH UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the ordinary k-means algorithm has a big disadvantage, that is, its clustering results will vary greatly with the change of the randomly selected initial clustering center, so it cannot be guaranteed that better clustering results can always be obtained , and the accuracy of the final clustering result depends on the selection of the initial clustering center

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Noise data removal method and implementation system of improved k-means algorithm
  • Noise data removal method and implementation system of improved k-means algorithm
  • Noise data removal method and implementation system of improved k-means algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0067] The total number of air temperature data collected from a Flammulina velutipes factory in Northeast China during the cultivation of Flammulina velutipes on a certain day is 1443. Draw these 1443 air temperature data into a line graph and analyze the line graph to get the air temperature The fluctuation of the value does not exceed 1°C, so the environmental threshold is set to 1, and the number of selected cluster centers is 2.

[0068] The total number of air temperature data collected from a enoki mushroom factory in Northeast China during the cultivation of enoki mushrooms on a certain day at node 2 of a warehouse is 1444. Figure 4 It is a distribution map of 1444 collected data. It can be seen that the data of the day has a maximum value at a certain point, and the duration is very short. The preliminary judgment is caused by a sensor failure and needs to be removed.

[0069] The traditional k-means algorithm and the improved algorithm of the present invention are u...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for removing noise data by improving the k-means algorithm. The method first adopts the farthest priority strategy to select k cluster centers, and then clusters the air temperature data according to the k cluster centers, and at the same time Update the clustering center until the upper and lower clustering centers remain unchanged to stop clustering, then introduce the environmental threshold, judge the distance between any two clustering centers and the environmental threshold, and filter out the part whose distance is greater than the environmental threshold One or several clusters with the lowest amount of data in the cluster are deleted to complete the removal of noise data. The present invention also discloses a system for implementing the noise data removal method of the improved k-means algorithm. The present invention can realize faster and more accurate Identify noisy data and remove it.

Description

technical field [0001] The invention relates to the field of noise removal, in particular to an improved k-means algorithm noise data removal method and an implementation system. Background technique [0002] Noisy data can be erroneous data in a dataset, random errors or biases in measuring variables, irrelevant or meaningless data. Noisy data is often caused by errors in the instrumentation that collected the data, errors in data transmission, technical limitations, or data entry errors. For example, in the process of sensor network collection, due to sensor failure or human reasons, the collected data will fluctuate greatly in a certain period of time, and this fluctuation is meaningless to subsequent mining tasks, and makes the data not in the specified In the data domain, it will affect the subsequent mining effect and results, so it needs to be eliminated. Commonly used methods to eliminate noise data are: binning method, regression method, and clustering method. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62
CPCG16Z99/00G06F18/23213
Inventor 黄静
Owner ZHEJIANG SCI-TECH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products