Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Long-time-series delta-anomaly-point detection method based on probabilistic suffix tree (PST)

A probabilistic suffix tree and long-term sequence technology, which is applied to pattern recognition in signals, instrument, character and pattern recognition, etc., can solve problems such as algorithms that rarely detect abnormal data points

Inactive Publication Date: 2018-03-27
FUDAN UNIV
View PDF3 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these algorithms are all about finding abnormal time series in the time series database, and there are few algorithms for detecting abnormal data points

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Long-time-series delta-anomaly-point detection method based on probabilistic suffix tree (PST)
  • Long-time-series delta-anomaly-point detection method based on probabilistic suffix tree (PST)
  • Long-time-series delta-anomaly-point detection method based on probabilistic suffix tree (PST)

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0095] The programming environment used by the system is MyEclipse, and the version of the Java virtual machine is 1.8. During specific implementation, the method is completed according to the following steps,

[0096] (1) The discretized long-term series adopts the SAX method;

[0097] (2) The algorithm for constructing the probability suffix tree is shown in Table 2;

[0098] Table 2. PST construction algorithm

[0099]

[0100]

[0101] The construction process is divided into two parts: first, construct the entire tree structure, and assign corresponding symbol strings to each tree node; then, traverse the symbolized training data set S, and count v corresponding to each tree node. count and v.nextSymbol[s](s∈∑) and calculate v.branchingProbability[s](s∈∑);

[0102] In this embodiment, a layer-by-layer construction method is used to assign a value to v.string of each PST node. The root node of the PST belongs to the zeroth layer, and the number of nodes in the fi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the field of anomaly detection of time series data, and relates to a long-symbol-string anomaly-point detection method based on a probabilistic suffix tree (PST). According tothe method, discretization technology of continuous data and a probabilistic suffix tree model are utilized to detect long-time-series anomaly data points, and the steps thereof include: discretizingthe originally continuous long time series data to obtain a long symbol string, constructing the probabilistic suffix tree according to a symbolized training data set, utilizing the constructed PST to detect the delta-anomaly-points in a to-be-detected data set, and utilizing F<1>-Measure to evaluate a detection effect. Experimental results show that the method can effectively support various long time series, is higher in all of a recall rate, an accuracy rate and a precision rate, is good in the detection effect, and can be applied to various fields of aerospace, medical data analysis, financial data analysis, network anomaly behavior detection and the like.

Description

technical field [0001] The invention belongs to the technical field of time series anomaly detection, and relates to a technique for discretely processing original time series by means of a symbolization method, in particular to a method for detecting abnormal points of long symbol strings based on a probability suffix tree. Background technique [0002] The prior art discloses that time series data is a data form that often appears in daily applications, and it has a wide range of applications in various fields such as aerospace, medical data analysis, financial data analysis, network abnormal behavior detection, and weather forecasting. In these application fields, frequent patterns in the mining sequence may not be able to reveal the abnormal information hidden in the data behavior, but these abnormal information can usually reflect certain problems. For example, abnormal data in the user's daily operation information may mean that the account password Compromised or comp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/62
CPCG06F2218/02G06F2218/08G06F18/231G06F18/2193G06F18/2415G06F18/214
Inventor 杨卫东丁希颖
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products