Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

63 results about "Concept drift" patented technology

In predictive analytics and machine learning, the concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes.

System and method for learning models from scarce and skewed training data

A system and method for learning models from scarce and / or skewed training data includes partitioning a data stream into a sequence of time windows. A most likely current class distribution to classify portions of the data stream is determined based on observing training data in a current time window and based on concept drift probability patterns using historical information.
Owner:IBM CORP

Systems and/or methods for dynamic anomaly detection in machine sensor data

Certain example embodiments relate to techniques for detecting anomalies in streaming data. More particularly, certain example embodiments use an approach that combines both unsupervised and supervised machine learning techniques to create a shared anomaly detection model in connection with a modified k-means clustering algorithm and advantageously also enables concept drift to be taken into account. The number of clusters k need not be known in advance, and it may vary over time. Models are continually trainable as a result of the dynamic reception of data over an unknown and potentially indefinite time period, and clusters can be built incrementally and in connection with an updatable distance threshold that indicates when a new cluster is to be created. Distance thresholds also are dynamic and adjustable over time.
Owner:SOFTWARE AG USA

System and method for mining time-changing data streams

A general framework for mining concept-drifting data streams using weighted ensemble classifiers. An ensemble of classification models, such as C4.5, RIPPER, naive Bayesian, etc., is trained from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. An empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.
Owner:SERVICENOW INC +1

Data stream anomaly detection system based on empirical features and convolution neural network

The invention discloses a data stream anomaly detection system based on empirical characteristics and convolution neural network. The system includes an empirical feature extraction module, which is used to identify statistical features and header features as features based on artificial experience, which play a more important role in data packet anomaly recognition; a bit stream conversion picture module used to convert the data stream into the form of two-dimensional gray-scale picture, and then through the convolutional neural network perception, the global high-level perception features are extracted; a fusion splicing module used for fusing the above modules as the data stream characteristics and identifying abnormal data streams by using the full connection layer of the neural network; a distillation model module that replaces complex networks in actual deployment; a concept drift fine-tuning module updates the detection model of concept drift; an update experience database module adding new network attacks or hidden attack instructions to the artificial experience database. The invention accurately and efficiently detects abnormal behaviors such as network failure, user misoperation, network attack and the like.
Owner:ARMY ENG UNIV OF PLA

Method for classifying data streams under dynamic data environment

The invention relates to the technical field of intelligent information processing and discloses a method for classifying data streams in a dynamic data environment. The method comprises the following steps: partitioning the data streams; establishing different classifiers for different concept drift; storing in a characteristic data pool of the classifiers; when a new data block arrives, judging whether the concept drift occurs or not by Kullback-Leibler (KL) divergence; if the concept drift does not occur, classifying by using the classifier at the last moment; if the concept drift occurs, seeking the proper classifiers from the characteristic data pool of the classifiers by the KL divergence and classifying; and if no coincident classifier exists, training a new classifier, adding the new classifier into the characteristic data pool of the classifiers and deleting the outdated classifiers. By the method, stable and mutational concept drift can be detected simultaneously; when the concept drift occurs, classification is performed by selecting the proper classifier to guarantee the efficiency of a model; and the performance of the model is guaranteed by deleting the outdated classifiers.
Owner:DALIAN UNIV OF TECH

Selective up-sampling combined method for weighted ensemble classification prediction of unbalanced data flows

The invention relates to the technical field of data mining, and discloses a selective up-sampling combined method for weighted ensemble classification prediction of unbalanced data flows. The method comprises the following steps of: screening minority class samples of history data blocks according to a similarity, and selecting samples closest to the current training data block in the aspect of concept; synthesizing the selected samples into new samples in a decision boundary area so as to selectively implement up-sampling; and carrying out weighted ensemble classification on the new sample by adoption of a probability distribution relevancy-based weight distribution strategy. According to the method, the minority class sample information is effectively increased through selecting history data with high similarities and synthesizing new data at the boundary area, so that the decision domain of the minority class is enlarged; and meanwhile, in order to adapt the dynamic data with concept drift and use an ensemble classification thought, the probability distribution relevancy-based weight distribution strategy is designed, so that the overall classification precision is enhanced. Experiment results show that the method is capable of effectively improving the minority class identification rate and the overall classification performance, and has the advantage of better processing the unbalanced data flows.
Owner:NORTHEASTERN UNIV

Self-adaptive trojan communication behavior detection method on basis of dynamic feedback

ActiveCN103532949AEliminate redundancyReduce false positive informationTransmissionRelevant informationSimilarity analysis
The invention discloses a self-adaptive trojan communication behavior detection method on the basis of dynamic feedback, which comprises the steps of processing trojan detection alarm information, constructing a sample set for dynamic feedback learning by utilizing the alarm information, and determining updating opportunity of detection by detecting concept drift of a data stream, wherein the step of processing the trojan detection alarm information comprises the sub-steps of carrying out merging and association processing on the alarm information which is subjected to standard description, then establishing an intrusion track event and storing the intrusion track event into an intrusion event table. According to the invention, aiming at the problem of self-adaption of information stealing trojan detection, the information stealing trojan detection alarm information is analyzed, methods of similarity analysis, clustering analysis and the like are combined, related information of a target IP (Internet Protocol) is acquired additionally by driving detection, the sample set for dynamic feedback learning is constructed by the alarm information, an increment support vector machine algorithm is used as an algorithm for dynamic feedback learning, and the updating opportunity of a detection system is determined by detecting the concept drift of the data stream.
Owner:PLA STRATEGIC SUPPORT FORCE INFORMATION ENG UNIV PLA SSF IEU

Multi-classifier integrating method based on increment native Bayes network

InactiveCN101251851AAvoid catastrophic forgettingImprove classification prediction resultsSpecial data processing applicationsMultiple classifierConcept drift
The invention relates to an increment-based naive Bayesian network multiple classifier integration method, comprising the following steps that: a integration classifier and various key parameters are initialized; if no novel data exists, the process is ended; a category of a novel data item is forecasted by utilization of the prior integration classifier; parameter values of all the individual classifiers are dynamically updated; the weighing of all the individual classifiers is updated; if no error of the category forecast of novel data by the integration classifier is generated, all the individual classifiers in the integration classifier are trained by utilization of the novel data item; redundant individual classifiers are deleted according to the KL pruning strategy; a novel individual classifier is increased; all the individual classifiers are trained by utilization of the novel data item. The increment-based naive Bayesian network multiple classifier integration method can effectively improve the classification forecast result when concept shift is generated, and is particularly suitable for processing the concept shift problem.
Owner:JILIN UNIV

Information entropy-based self-adaptive integrated classification method of data streams

The invention discloses an information entropy-based self-adaptive integrated classification method of data streams. Concept drift can be detected, and duplicate concepts can also be identified. In asystem, a new classifier is reconstructed and put into a classifier pool only when existence of a new concept is detected, the problem of duplicate training caused by duplicate concept appearance is prevented, model updating frequency is reduced, and real-time classification ability and classification effect of a model are improved. Through carrying out performance analysis comparison with classical data stream algorithms on a synthetic dataset and a real dataset, experiments show that the method of the invention can cope with multiple types of concept drift, improves anti-noise ability of theclassification model, and also has lower time cost consumption on the premise of ensuring higher classification accuracy. The method of the invention can be applied to many practical problems of sensor network anomaly detection, credit-card fraud behavior detection, weather forecasting, electricity price prediction and the like.
Owner:XINYANG NORMAL UNIVERSITY

Multi-model malicious code detection method based on reliability probability interval

The invention provides a multi-model malicious code detection system based on reliability probability interval. Each machine learning detection model corresponds to a distribution of the underlying data, and various threshold-based detection models can be integrated into the statistical platform, so that the distribution of the semantic code data is detected from the multi-angle view, and the model degradation problem caused by the concept drift is relieved. The detection system changes the prediction mode of 0 or 1 of the existing machine learning detection model, calculates the score based on the existing detection model, carries out statistical analysis, and establishes a isotonic regression function for the score distribution of the sample and the label of the sample. For an unknown sample, according to the score given by the existing detection model, the calculated isotonic regression function is input, the reliability probability interval of a certain label can be given, and theprobability interval can relieve the problem of over-fitting of the fixed threshold to the training data set, the adaptive ability of the detection model to the current dynamic data is improved, and the concept drift phenomenon is found in advance.
Owner:NANKAI UNIV

Real-time detection method of equipment exceptions on the basis of synchronous data flow compression

InactiveCN106126385AAbnormal real-time detectionImprove accuracyHardware monitoringData setAnomaly detection
The invention discloses a real-time detection method of equipment exceptions on the basis of synchronous data flow compression. The characteristics of each piece of equipment are collected and are grouped, a group dataset which represents the normal operation state of the group of equipment and an own dataset which represents the normal operation state of the equipment are constructed, so that the records of the two datasets are compared to comprehensively obtain an exception detection result, and detection accuracy is improved. Meanwhile, since the operation states of the equipment are different under different environments, a concept drifting detection method based on principal component analysis is adopted to detect operation state data, whether the operation state data is evolved or not is judged, the two datasets are initialized again if the operation state data is evolved, and therefore, detection accuracy is further improved. In addition, the synchronous data flow compression is adopted to reduce a calculated amount of a comparison process so as to realize the real-time detection of the equipment exceptions.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Short-text data stream classification method based on short-text expansion and concept drift detection

The invention discloses a short-text data stream classification method based on topic models and concept drift detection. The method includes: 1, acquiring an external corpus from a knowledge libraryto construct the LDA topic model; 2, dividing a short-text data stream into data blocks according to a sliding window mechanism, and using the LDA topic model to expand short text in the data blocks to obtain an expanded data stream; 3, constructing the online BTM topic model for each data block in the expanded short-text data stream, and obtaining a topic representation of each piece of short text; 4, selecting data blocks of Q topic representations to construct a classifier to use the same to predict a class label of a newly arrived data block; 5, dividing the data blocks of the Q topic representations into category clusters according to class label distribution, and calculating semantic distances between the category clusters and the newly arrived data block to judge whether concept drift occurs; and 6, updating the classifier according to a concept drift situation. The method can be used for the short-text data stream classification problem of unceasingly changed class label distribution.
Owner:HEFEI UNIV OF TECH

Data flow detection method based on fuzzy C-means clustering algorithm and entropy theory

ActiveCN105069469AHigh clustering accuracyClustering is flexibleCharacter and pattern recognitionCluster algorithmAnalysis data
The present invention discloses a data flow detection method based on a fuzzy C-means clustering algorithm and an entropy theory. An FCM clustering algorithm is introduced into the clustering analysis of a data flow, and data flow data is subjected to fuzzy C-means clustering analysis. The information entropy of the data flow is calculated by using the membership of the obtained data. Through analyzing the change trend of the entropy of the data flow, the detection of the concept drift with attribute change is carried out. The calculation of the membership and the entropy of the data flow are included. According to the method, the entropy theory is introduced, by using the membership of data to a class, the entropy of the data flow is calculated, the change of an entropy value is expressed in a time axis, and the concept drift with the attribute change is detected through the trend of a curve. The detection of the concept drift with the attribute change is visually carried out through observing the trend of an entropy value curve. The detection is mainly applied to timely prompt a system to update a parameter or not so as to ensure the correct clustering analysis possible of continuous influx of data streams.
Owner:天津津汉科技有限公司

Business process predictive monitoring method

The invention discloses a business process predictive monitoring method. Sequence coding and sequence distance measurement are carried out based on an event log. According to the algorithm, a frequentactivity set is used for coding an activity sequence, different weights are given to frequent activity subsequences and data attributes, and historical similar data are searched for predictive monitoring. The monitoring method effectively monitors the flow in the execution process on the basis that the flow model cannot be known, and adapts to the model change caused by concept drift. The historical information of the event log record is mined and analyzed, and the method can be used for predictively monitoring the execution condition of the current process, such as predicting the next activity, the execution result of the process, the probability of abnormality of the process, the execution time of the activity and the like. The predictive monitoring predicts a series of subsequent activities through a current incomplete event track prefix, and the predictive monitoring helps an enterprise to monitor the execution condition of each process, timely discover risks and make corresponding countermeasures in advance, and improve the resource scheduling capability of the enterprise.
Owner:江阴逐日信息科技有限公司

Big data stream type cluster processing system and method for on-demand clustering

ActiveCN103353883AEfficient use ofSolve problems that are processed quicklySpecial data processing applicationsComputer moduleConcept drift
The invention discloses a big data stream type cluster processing system for on-demand clustering. The system comprises a fast computation module, a data concept drift detection module and a clustering module, wherein an output end of the fast computation module is connected to a first input end of the clustering module through the data concept drift detection module, and the clustering module is connected to the fast computation module. According to the invention, aiming at characteristics of mass, similarity and repetition of the big data, an on-demand clustering model based on data concept drift detection adopts a triggered type clustering processing mode, the accuracy is guaranteed, and on-demand clustering and real-time clustering result services are provided; and secondly, a resource monitoring module and an independent module are provided for clustering processing, the prior traditional clustering algorithms are effectively utilized, expansibility and sensitivity of the system can be enhanced, and quick processing of the data stream in a big data environment is efficiently realized. The big data stream type cluster processing system for on-demand clustering can be widely applied to the field of data processing.
Owner:SOUTH CHINA NORMAL UNIVERSITY

Unbalanced-like network traffic classification method and device and computer equipment

The invention relates to the technical field of network traffic classification, and relates to an unbalanced-like network traffic classification method and device and computer equipment. The method comprises the steps of obtaining to-be-classified network traffic data, and extracting features of network traffic; deleting irrelevant features and redundant features by adopting a feature selection algorithm, and performing dimension reduction on the remaining features so as to select an optimal feature subset; and inputting the optimal feature subset into a weight-based multi-classifier, performing network traffic classification training in an incremental learning mode, optimizing classifier performance, and classifying the network traffic. Aiming at the problem of unbalanced distribution ofnetwork traffic samples, irrelevant features and redundant features are deleted, and the recognition rate of small categories is effectively improved on the premise of ensuring the overall classification accuracy; an incremental learning thought is introduced, the flexibility of model updating training is improved, and the model updating period is shortened; and by utilizing multiple classifiers based on weight, the influence caused by concept drift is reduced.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Visualization method for concept drift of data stream in dynamic data environment

The invention relates to the technical field of intelligent information processing, and discloses a visualization method for concept drift of a data stream in a dynamic data environment. The visualization method comprises the following steps: achieving static treatment on the data stream; establishing different concept presentation modes according to different concept drift modes and saving the different concept presentation modes in concept pools; and when a new data block comes, utilizing the KL divergence algorithm to search similar concept presentations in the concept pools, if the similar concept presentations exist, counting the similar concept presentations, and if not exist, adding the new data block into the concept pools to serve as a new concept to be saved. The visualization method can be used for detecting the changed drift types of various data streams, can fully analyze the concept drift process in the data stream through counting, finally, utilizes the Bays method to draw a concept drift and transition graph according to the statistic result, and visualizes the concept drift and transition graph for assisting data mining in a concept level.
Owner:DALIAN UNIV OF TECH

Data flow concept drift detection method and system

The invention discloses a data flow concept drift detection method and system. The method analyzes a cluster center, cluster disappearance and cluster new increasing to detect the level of concept drift. The system comprises a cluster center analyzing unit, a cluster disappearance analyzing unit, a cluster new increasing and a concept drift level analyzing unit. The data flow concept drift detection method and system can recognize concept drift from multiple aspects, achieves accurate quantization on concept drift evaluation indexes of a data set to be detected, can comprehensively analyze concept drift situations and can accurately detect the level of concept drift. The data flow concept drift detection method and system is applied to detection of data evolution.
Owner:SOUTH CHINA NORMAL UNIVERSITY

Streaming data classification method based on decision tree

The invention provides a streaming data classification method based on decision tree, and relates to the technical field of data classification. The method comprises the following steps: step 1, constructing a classifier; step 2, classifying the to-be-classified data according to the initial integrated classification model to obtain a classification result set; when the data size in the data container Wintmp meets the sliding window size, updating the current integrated classification model; step 3, observing the distribution state of data in the classification result set in the window, and taking the state as a standard for judging whether the concept drift occurs or not to finish the detection of the concept drift; step 4, acquiring historical data, and counting the increase and decreaserule of the data volume within one day; and obtaining the data volume in a preset time period according to the change rule of the data volume; and step 5, according to a concept drift detection result and a preset data size, carrying out expansion or reduction operation on the data window. According to the method, the data classification accuracy is improved, the data can be timely processed, andthe data classification efficiency is improved.
Owner:NORTHEASTERN UNIV

Double-window concept drift detection method based on sample distribution statistical test

The invention discloses a double-window concept drift detection method based on sample distribution statistical test, and belongs to the field of machine learning. Aiming at the problem of concept drift of data streams along with time attributes, the method comprises the following steps: firstly, carrying out outlier detection in a fixed window by adopting support vector regression (SVR); then, for the detected outliers, calculating the Euclidean distance between new and old samples in a variable window, and according to the Euclidean distance, performing statistical analysis in combination with multiple distribution inspection methods to indirectly reflect whether data distribution is changed or not so as to determine whether drifting occurs or not; and finally, verifying the effectiveness of the method on a cement strength reference data set and an urban solid waste incineration (MSWI) outlet nitrogen oxide concentration data set.
Owner:BEIJING UNIV OF TECH

Exception detection method based on data flow concept drift

The invention provides an exception detection method based on data flow concept drift, belongs to the field of data mining and exception detection, and aims to provide an exception detection method based on data flow concept drift which can detect concept drift in time. The method comprises the steps of S1, obtaining the real data collected by a to-be-detected system at different moments to form areal data stream, and establishing a current prediction model of the to-be-detected system according to the real data stream; S2, predicting the data of the next time period through a prediction model to obtain a prediction data stream; S3, calculating a similarity data set between the real data flow and the prediction data flow; S4, judging whether concept drift occurs or not according to the similarity data set and a current concept drift threshold value of the to-be-detected system; S5, if not, repeating the steps S2 to S4; S6, if yes, updating the prediction model, the concept drift threshold value and the exception detection threshold value, and repeating the steps S2 to S6 according to the updated prediction model and the concept drift threshold value.
Owner:TAIYUAN UNIVERSITY OF SCIENCE AND TECHNOLOGY

System and method for mining time-changing data streams

A general framework for mining concept-drifting data streams using weighted ensemble classifiers. An ensemble of classification models, such as C4.5, RIPPER, naive Bayesian, etc., is trained from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. An empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.
Owner:SERVICENOW INC +1

Multi-model cross detection of malicious code based on statistical learning

ActiveCN109033836AEarly detection of signs of agingPlatform integrity maintainanceStatistical learningDegradation Problem
The invention provides a multi-model cross detection of a malicious code based on statistical learning, which can be well applied in the field of malicious code detection. This method introduces credibility, solves the problem of isolation among machine learning models, and provides a platform for machine learning models to learn from each other. In addition, on the fine-grained statistical learning platform, multiple machine learning models statistically analyze the mutation process of malicious code from different perspectives, alleviate the degradation problem of a single model, and use APValgorithm to identify concept drift phenomenon, so as to achieve common defense of multiple models.
Owner:NANKAI UNIV

Streaming data integration classification method and device based on concept drift

InactiveCN108764322ASolve frequentlyAddressing Dynamic Concept DriftCharacter and pattern recognitionStreaming dataReal-time data
The invention discloses a streaming data integration classification method and device based on concept drift, comprising the following steps: acquiring a plurality of data blocks including sample datawith class labels and sample data without class labels; training a single-class base classifier for each class in the plurality of the data blocks; constructing an integrated classification matrix according to the single-class base classifiers corresponding to the plurality of data blocks; when a new data blocks arrives, updating the integrated classification matrix, and calculating the class labels of the sample data without class labels. The method and device can solve the problem of frequent and dynamic concept drift of a data stream to a large extent within acceptable time complexity, andachieves real-time data stream classification processing while ensuring classification accuracy.
Owner:QILU UNIV OF TECH

Concept drift detection method based on classification error rate and consistency prediction

ActiveCN112131575ATimely identification of degradation phenomenaEfficiently assess sustainabilityCharacter and pattern recognitionPlatform integrity maintainanceMisclassification errorComputational model
The invention provides a concept drift detection method based on a classification error rate and consistency prediction, and belongs to the technical field of computer machine learning and informationsecurity. According to the method, mutation type concept drift is detected by calculating a change of the classification error rate of a model, and then the progressive concept drift is detected by calculating a consistency degree of the samples with wrong classification and the samples with correct classification so that the mutation type concept drift and the progressive concept drift can be detected in time, and relatively low calculation overhead is kept. According to the method, detection of mutation type concept drift and progressive concept drift is achieved at low calculation cost, and a model degradation phenomenon is recognized in time. The method is mainly used for concept drift detection, can effectively act on early judgment of a degradation phenomenon of a machine learning classification model, and can be used as a performance monitoring method in various application fields such as automatic analysis and decision in a big data environment.
Owner:BEIJING INSTITUTE OF TECHNOLOGYGY

Random classification method and device based on combined classifier

The invention relates to the technical field of classification calculation, in particular to a random classification method and device based on a combined classifier. The method comprises the following steps that: randomly selecting N pieces of classifiers of which the types are not identical as a combined classifier; selecting a training set and a test set for each classifier which is randomly selected; independently training and testing each classifier to obtain the average accuracy of the combined classifier; according to the average accuracy of the combined classifier, judging whether an elimination mechanism is triggered or not; on the basis of a judgment result, entering a classification calculation step to obtain the classification result of each classifier; and voting for the classification result of each classifier to obtain a final classification result. By use of the method, the phenomena of overfitting and underfitting can be reduced, a discrete variable and a continuous variable can be supported, and a concept drift phenomenon can be overcome.
Owner:MICRO DREAM TECHTRONIC NETWORK TECH CHINACO

Recommendation system and method suitable for concept-drift medical scheme

The invention provides a recommendation system and method suitable for concept-drift medical scheme. The system comprises a user interface module which is connected with a recommendation module and is used for reading initial training samples; the recommendation module which is connected with a work flow system and automatically and intelligently calculates a most suitable medical scheme for a current case according to data of historical cases; an external database which is connected with the recommendation module and is used for reading the initial training samples; the external database is used for storing detail information of each case, for example, personal information such as names, heights and body weights and specific medical index information. According to the invention, influence of concept drift on results is taken into account as well as factors such as sampling time; the occurrence of concept drift can be detected and outdated samples are corrected to suit new rules so that later prediction is more accurate.
Owner:RUIJIN HOSPITAL AFFILIATED TO SHANGHAI JIAO TONG UNIV SCHOOL OF MEDICINE +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products