Large-scale data abnormity detection method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for large-scale data and anomaly detection, applied in electrical digital data processing, special data processing applications, genetic models, etc., can solve the problems of poor algorithm detection performance, easy to be affected by complex data, etc., to improve anomaly detection performance , The effect of reducing the workload of computing and reducing the amount of data

Inactive Publication Date: 2017-10-24

UNIV OF ELECTRONICS SCI & TECH OF CHINA

View PDF0 Cites 27 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, since the SCIFOREST algorithm only considers and tests the experimental data, in actual work, the detection performance of the algorithm is not good in the face of unbalanced, mixed, and high-dimensional large-scale data environments, and it is easily affected by complex data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0039] Such as figure 1 The anomaly detection method for large-scale data of the present invention includes:

[0040] A. Data preprocessing and feature extraction: Perform necessary data preprocessing on the original data, including data integration, data reduction and data cleaning, and then obtain preprocessed data sets and sample subsets. Then perform feature extraction on the preprocessed data, including:

[0041] A1. Data resampling: balance the samples of the preprocessed data through the preset ratio of positive and negative classes, and reduce the impact of negative samples on feature extraction;

[0042] A2. Calculation of information gain rate: Calculate the information gain rate of features through the data of multiple sample subsets, and sort the calculation results to form multiple feature sets; the method of calculating the information gain rate of features is:

[0043] Suppose the data set is D and the feature is A i (i=1,...,k), first calculate the entropy H...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a large-scale data abnormity detection method comprising the steps that A. data preprocessing and feature extraction are performed; B. hyperplane calculation based on twin support vector machines is performed, and a hyperplane standard function of partition data space is constructed; C. an isolation tree is formed: the isolation tree is established through the partition criterion of the hyperplane of the twin support vector machines; D. an isolation forest is formed: the step C is repeated, and multiple isolation trees are constructed so as to form the isolation forest; and E. the isolation forest is traversed and the abnormity score is calculated: the isolation forest is traversed through the data under abnormity detection and the abnormity score is calculated to act as the standard for judging the degree of abnormity score, and existence of the abnormal data in the original data is judged according to the standard. The detection data volume can be effectively reduced so that the calculation workload can be reduced, the abnormity detection accuracy can be enhanced without significant increasing of time consumption and the abnormity detection performance for the high dimensional data can be greatly enhanced.

Description

technical field [0001] The invention relates to a data mining method, in particular to a large-scale data anomaly detection method. Background technique [0002] Anomaly detection refers to discovering and looking for data objects that are obviously different from most other data through corresponding technical means. Generally speaking, these data are very small compared to normal data. The object of anomaly detection is called anomaly point, or isolated point, outlier point. Although these data are often hidden among normal data and cannot be found directly, there may be important information hidden behind these data, which has great research value. In 1980, Hawkins first defined an outlier as a value that is significantly different from other values, making people question whether it is produced by a different and unknown mechanism. From then on, outliers are no longer noises in the field of data mining, nor data that need to be discarded in the preprocessing stage. W...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/62G06F17/30G06N3/12

CPCG06F16/215G06N3/126G06F18/2411G06F18/24323

Inventor 罗光春殷光强田玲闫科

Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Large-scale data abnormity detection method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology