Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

LOF outlier detection method and system based on grid pruning

An outlier detection and outlier technology, applied in the field of data processing, can solve the problems of poor practicability and high complexity

Pending Publication Date: 2019-11-19
GUANGDONG UNIV OF TECH
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a LOF outlier detection method and system based on grid pruning to overcome the defects of high computational complexity and poor practicability of the existing density-based outlier detection algorithm described in the above prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • LOF outlier detection method and system based on grid pruning
  • LOF outlier detection method and system based on grid pruning
  • LOF outlier detection method and system based on grid pruning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0067] This embodiment provides a method for detecting LOF outliers based on grid pruning.

[0068] Such as figure 1 As shown, the method includes the following steps:

[0069] S1: Input the data set and preprocess the data set;

[0070] S2: Suppose the data set has s equal-length intervals, divide each dimension of the data set into equal distances according to the input s value, and at the same time, calculate the boundary range of each grid, and number the grids;

[0071] In a high-dimensional space, multiple dimensions are divided into s segments, and the data set is divided by the dividing point line marked along each dimension. The irregular section cut out is the grid boundary. The specific boundary value needs to be determined according to the dimension of the data, the size of the data set, and the given number of division intervals s.

[0072] S3: Compare each data object in the dataset with the boundary range of the grid to find the grid to which it belongs;

...

Embodiment 2

[0107] This embodiment provides a detection system applying the grid pruning LOF outlier detection method described in Embodiment 1.

[0108] Such as figure 2 As shown, the system includes: data preprocessing module, data storage module, data cleaning module, Spark distributed computing module;

[0109] The input end of the data preprocessing module is connected with the external data source, the output end of the data preprocessing module is connected with the data storage module, the output end of the data storage module is connected with the data cleaning module, and the data cleaning module is connected with the Spark distributed computing module, The Spark distributed computing module is finally connected to the data storage module;

[0110] The data preprocessing module is responsible for data import and preprocessing, and outputs the preprocessed data to the data storage module;

[0111] The data storage module includes a distributed file system, and the data storage...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an LOF outlier detection method and an LOF outlier detection system based on grid pruning. The method comprises the steps that S1, reading and preprocessing a data set; S2, performing equidistant division on the data set, calculating a boundary range of each grid, and numbering the grids; S3, comparing each data object in the data set with the boundary range of the grid tofind the grid to which the data object belongs; S4, calculating the grid density and the clustering radius of each grid, and determining a grid density threshold value and a clustering radius threshold value; S5, pruning the grids; S6: carrying out outlier detection. The device comprises a data preprocessing module, a data storage module, a data cleaning module and a Spark distributed computing module. According to the method, the problem of poor practicability caused by relatively high time and space complexity when an existing LOF outlier detection method is used for processing real-time large-scale high-dimensional data objects is solved. The high efficiency and the practicability of a calculation process are improved.

Description

technical field [0001] The present invention relates to the field of data processing, more specifically, to a method and system for detecting LOF outliers based on grid pruning. Background technique [0002] Outlier mining technology is an important research direction of data mining technology. In the process of data mining and analysis, some special data or data segments are often found, and their fluctuations are significantly different from those of other data in the data set. Such rare data points or data segments are called Outliers are also called outliers. The appearance of outliers seriously affects the efficiency of data utilization and the quality of decision-making. At the same time, outlier data often enables people to discover some potentially useful knowledge. With the acceleration of the urbanization process, the data in today's life often have the characteristics of high dimensionality, large scale, multi-source heterogeneity, etc., which puts forward highe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/2458G06F16/215
CPCG06F16/2462G06F16/215
Inventor 张绪升谢胜利
Owner GUANGDONG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products