Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

75 results about "Semi supervised clustering" patented technology

Chat Categorization and Agent Performance Modeling

Chat categorization uses semi-supervised clustering to provide Voice of the Customer (VOC) analytics over unstructured data via an historical understanding of topic categories discussed to derive an automated methodology of topic categorization for new data; application of semi-supervised clustering (SSC) for VOC analytics; generation of seed data for SSC; and a voting algorithm for use in the absence of domain knowledge / manual tagged data. Customer service interactions are mined and quality of these interactions is measured by “Customer's Vote” which, in turn, is determined by the customer's experience during the interaction and the quality of customer issue resolution. Key features of the interaction that drive a positive experience and resolution are automatically learned via machine learning driven algorithms based on historical data. This, in turn, is used to coach / teach the system / service representative on future interactions.
Owner:24 7 AI INC

Method for selecting regression test case for clustering with semi-supervised information

The invention discloses a method for selecting a regression test case for clustering with semi-supervised information. The method comprises the following steps: recording the execution overage information of the test case, generating a function execution profile, and representing the test case in a quantitative form; analyzing the historical test results to obtain the constraint relationship among test cases; and analyzing the test cases with a semi-supervised clustering algorithm to obtain similarities and differences of the execution conditions of the test cases, understand the relation between program behaviors and the test cases, effectively reduce the number of test cases in the regression test stage and maintaining enough high error detection capability. According to the invention, the program is understood according to the internal relation of the program behaviors revealed by the test cases based on the data mining technology so that the selection of the test cases is easier and more automatic, the tests cases can be used more effectively in regression tests, the test case selection accuracy is promoted, and the regression test efficiency is improved.
Owner:NANJING UNIV

Semi-supervised clustering integrated protocol identification system

The invention discloses a semi-supervised clustering integrated protocol identification method. The method comprises the following steps: various data packets in a network are acquired; received network data is analyzed, and each field of the data packets is extracted and counted; feature code of network data obtained after the network data is analyzed is matched with various feature codes preset in a data base, if the match is successful, the data packets are corresponding protocols; data not successfully matched is subject to cluster analysis, a plurality of base clustering devices are used to cluster the data packets, and the result is fed back, and a priori label value is modified; and a semi-supervised statistical learning is carried out for the result of the clustering of the network data packets and each known protocol, and a discriminant learner is trained. According to the invention, the terminal protocol identification rate is improved, and the amount of calculation is moderate, so that the efficiency is high; one time of dialog generate less flow, inaccurate identification is not easy; and besides, the method integrates a plurality of identification methods, so as to achieve multi-dimension identification. The invention also discloses a corresponding semi-supervised clustering integrated protocol identification system.
Owner:SHENZHEN Y& D ELECTRONICS CO LTD

Method and device for personalized searching of commodities sequenced based on attributes

The invention belongs to the technical field of electronic commerce, and relates to a method and a device for personalized searching of commodities in an electronic commerce activity, in particular to a method and a device for personalized searching of commodities which are sequenced based on attributes. The method and the device are used by a user to search and find needed commodities by using a computer during online shopping. The method comprises the following steps of: collecting and analyzing interests of the user in commodity attribute information by analyzing electronic commerce data from the internet; converting the commodity attributes concerned by the user into attribute sequencing knowledge in data mining; merging the attribute sequencing knowledge as future knowledge; clustering the future knowledge by using a semi-supervised clustering method; and finally, sequencing the commodities in a clustering result, and presenting commodity searching results to the user so as to guide the user to select the commodities. The method is simple in technical process, convenient for operation, accurate in information acquisition, scientific in sort order and high in searching speed, and the device is simple in structure and flexible in operation, so the method and the device can be used for replacing a commodity searching technology and commodity searching equipment in the conventional electronic commerce.
Owner:QINGDAO TECHNOLOGICAL UNIVERSITY

Transfer learning method based on semi-supervised clustering

The invention provides a transfer learning method based on semi-supervised clustering. The transfer learning method based on the semi-supervised clustering comprises the following steps: calculating similarity and average similarity of data in each class of target data and auxiliary data; according to the average similarity, obtaining a similarity weight vector of the target data and a class tag; taking the vector with the maximum weight as a tag of the target data; with the target data as a centroid, performing K-means clustering into clusters, wherein the tag, having the maximum proportion of data in each cluster to the total data of the class to which the cluster belongs, is taken as a cluster tag; comparing a classification result with a pre-classification result; and in the finally-formed similarity weight vector of the target data, selecting a data tag with the maximum weight as the data tag of the target data so as to form a final classifier. The invention provides the transfer learning method based on the semi-supervised clustering, which can transfer a classifying method and a classifying technology from one field to another field and can improve the precision of the classification result.
Owner:HARBIN ENG UNIV

Assessment method for transient voltage stabilization of load area of electrical power system

The invention relates to an assessment method for transient voltage stabilization of a load area of an electrical power system and belongs to the electrical power system stability analyzing and assessment field. The assessment method for the transient voltage stabilization of the load area of the electrical power system includes that using data measured by a synchronous phasor measurement unit as basis, and building an initial sample database for data mining through a lot of simulation samples; extracting characteristics for reflecting the stabilization degree of each node through the quantitative evaluation for each node in the area; identifying through a multiple linear regression method to obtain sensitivity coefficients for reflecting the mutual influence relations between the nodes in a local area network; using a semi-supervised clustering method to demarcate all the samples; using a decision tree algorithm to perform classified learning to obtain a decision tree model, and using the decision tree model for online monitoring to assess the global transient voltage stabilization state of the load area of the electrical power system.
Owner:TSINGHUA UNIV

Internet flow distinguishing method

The invention discloses an internet flow distinguishing method. According to a small quantity of marked flow samples and by virtue of offline supervised learning classification, unmarked flows are identified according to the characteristics of classified flows, and application classes of generated flows can be predicted in the early stage of network flow generation, thereby ensuring the promptness of network supervision and classifying the network flows in an actual network environment further. Through further adding new application types in semi-supervised clustering, a correlation chart of application type marks and application types is perfected, and alleged flows in the network are effectively marked, therefore, flow data with accurate application type labels can be obtained in real time. Meanwhile, when the network environment changes, the change of the network environment is reflected in the semi-supervised clustering, and the requirement on the distinguishment of flows in a new network environment is further met.
Owner:UNIV OF JINAN

Self-adaptive semi-supervised network traffic classification method, system and equipment

The invention relates a self-adaptive semi-supervised network traffic classification method, system and equipment. The method comprises the following steps: acquiring a network stream, extracting thestream feature with the preset fixed quantity in each network stream to obtain a network stream feature vector; computing the centroid of the network stream feature vector in each type according to the marked network stream feature vector, thereby obtaining a vector set M; performing self-adaptive semi-supervised k-means clustering by taking the vector set M as an initial center point; mapping theobtained network stream in each type of cluster to the belonged traffic type according to the maximum posterior probability; taking the traffic cluster of the known type as the training data to trainan online traffic classifier. The invention further relates to a system, the system comprises an acquisition module, a vector set processing module, a clustering module, a classification module, andan output module. The invention relates to the equipment. The equipment comprises a processor, a memorizer, and a computer program stored on the memorizer and capable of being run on the memorizer.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Cloud network end cooperative defense method and system based on end-side edge computing

The invention discloses a cloud network end cooperative defense method and system based on end-side edge computing, and relates to information security of an electric power industrial control system. The method comprises the following steps: setting an edge computing center at a terminal side, collecting industrial control system terminal equipment information and communication flow information, defining and identifying attribute characteristics of an electric power industrial control terminal by utilizing equipment fingerprints, automatically collecting the fingerprints of the electric power industrial control terminal equipment by utilizing an Nmap scanning method, establishing a training model by a decision tree algorithm, and achieving the dynamic fingerprint authentication of the terminal equipment; through setting a switch mirror image, intelligent monitoring host flow control and cloud computing center training flow baseline, industrial control terminal equipment flow anomaly detection is realized, and a cloud cooperative defense technology based on edge computing is realized. Through flow data acquisition, information entropy quantification flow characteristic attribute preprocessing and improved semi-supervised clustering K-means algorithm training, abnormal flow detection of the electric power industrial control intranet is realized, and cloud network real-time defense based on abnormal flow detection is realized.
Owner:NORTH CHINA ELECTRIC POWER UNIV (BAODING) +3

Unbalanced text classification method and system combining SVM and semi-supervised clustering

The invention discloses an unbalanced text classification method and system combining SVM and semi-supervised clustering. The unbalanced text classification method comprises the steps: carrying out preprocessing on a to-be-processed text, and obtaining text data in a vector format, and enabling the text data to serve as a data set; using the training set to train the SVM classifier to obtain a classification model, and using the classification model to predict the test set to obtain the category and confidence of the test set; clustering the data set by using a semi-supervised clustering algorithm to obtain the category to which the test set belongs and the confidence coefficient of the test set; and fusing the category to which the test set obtained by the SVM classifier and the semi-supervised clustering algorithm belongs and the confidence coefficient of the test set to obtain final output. According to the unbalanced text classification method, different types of methods in the technical field of unbalanced text classification are combined; advantage complementation of the different methods is achieved; vectorization and normalization methods are used; and the defect that whenhigh-dimensional sparse text data are processed, a text classification result is inaccurate due to the fact that labeled texts are too few is overcome. The unbalanced text classification method effectively solves the problem of text class imbalance.
Owner:JIANGSU UNIV

Risk control method, device, apparatus and medium for user payment behavior

The present application provides a risk control method, device, apparatus and medium for user payment behavior, which relates to the technical field of data processing. The method comprises the following steps: acquiring a plurality of historical transaction result sample data corresponding to user payment behavior; Taking the transaction behavior characteristics and transaction behavior attributes in the sample data of historical transaction results as the input and output of semi-supervised clustering model respectively, constructing and training the semi-supervised clustering model to obtain the risk identification results; Inputting the transaction data corresponding to the user's payment behavior into the trained risk identification model to obtain the risk identification result; According to the risk identification result and the service type corresponding to the transaction data to be identified, determining the response operation to the transaction data to be identified. The present application is capable of risk identification in milliseconds with high speed and accuracy. The application can also automatically intercept the high-risk payment behavior, thereby improving thesecurity of the payment behavior of the user and reducing the property loss of the user.
Owner:华青融天(北京)软件股份有限公司

System and method for adaptive categorization for use with dynamic taxonomies

A system, method and computer program product provides a solution to a class of categorization problems using a semi-supervised clustering approach, the method employing performing a Soft Seeded k-means algorithm, which makes effective use of the side information provided by seeds with a wide range of confidence levels, even when they do not provide complete coverage of the pre-defined categories. The semi-supervised clustering is achieved through the introductions of a seed re-assignment penalty measure and model selection measure.
Owner:IBM CORP

p2p network traffic detection method

The invention relates to a traffic detection method for a peer-to-peer (P2P) network. The method is used for solving the technical problem of low detection accuracy of the conventional network traffic detection method. The technical scheme is that: a classifier is trained in two stages, a value of a positive instance sample number N in a test sample is approximately estimated by using semi-supervised clustering, and a two stage variable model (TSVM) is further trained according to the value of N. Compared with a background technology, the invention makes the value of N closer to a true value, endows the trained classification TSVM with high stability and robustness and improve network traffic detection accuracy. A great amount of unmarked data takes part in the training of the classification model, and the advantages of semi-supervised learning are fully utilized; therefore, compared with a conventional supervised learning algorithm in which the model is trained only by marked data, the method is higher in accuracy and stability.
Owner:NORTHWESTERN POLYTECHNICAL UNIV

Hyperspectral small sample classification method based on lightweight network and semi-supervised clustering

The invention relates to a hyperspectral small sample classification method based on a lightweight network and semi-supervised clustering. A lightweight network model is constructed by using a Point-wise convolution kernel, a Depth-wise convolution kernel and double loss. The Point-wise convolution kernel and the Depth-wise convolution kernel can greatly reduce the number of parameters, and reducethe demand for training samples in the network training process. The depth feature space can be more separable through the double-loss strategy, and classification and clustering in the depth featurespace are better facilitated. In addition, the semi-supervised approximate order clustering algorithm can select more self-confident pseudo tags, and more favorable conditions are provided for improving the network training effect. According to the method, autonomous extraction and high-precision classification of hyperspectral image depth features and label data are realized under the conditionof small samples.
Owner:NORTHWESTERN POLYTECHNICAL UNIV

Cross-project defect prediction method based on semi-supervised clustering data screening

The invention a cross-project defect prediction method based on semi-supervised clustering data screening; the method comprises: using a semi-supervised clustering algorithm to cluster software module data to discover subclusters; collecting, from all the generated subclusters, all cross-project historical software modules having the same marks as the historical software modules of this project, namely screened cross-project software module data; using a naive Bayesian classification algorithm to establish a cross-project defect prediction model based on the screened cross-project software module data and all the historical software module data of this project, and predicting software module data, to be predicted, of this project. The method has the advantages that the cross-project software prediction model can be protected from the influence by irrelevant cross-project software module data, cross-project historical software module information and historical software module information of this project are made full use, and the performance of the cross-project software prediction model is enhanced.
Owner:WUHAN UNIV

Semi-supervised text clustering method and device fusing pairwise constraints and keywords

The invention discloses a semi-supervised text clustering method and device fusing pairwise constraints and keywords. The method comprises the following steps of: fusing pairwise constraints to assist in text clustering to obtain an initial feature word weight; fusing the pairwise constraints and keywords and performing the semi-supervised clustering at the same time based on the obtained initial feature word weight; and evaluating and selecting a clustering result according to a user satisfaction degree. The device provided by the invention comprises a pre-processing module, a text clustering module fusing pairwise constraints, a semi-supervised text clustering module fusing pairwise constraints and the keywords, and an evaluation and selecting result module. Since the semi-supervised text clustering method provided by the invention continuously adds keyword information on the basis of fusing pairwise constraint information, the keyword information is used for adjusting the corresponding feature word weight while applying the pairwise constraints to learning the feature word weight; and therefore, the two prior information can be mutually influenced and promoted to obtain a more accurate clustering result.
Owner:QINGDAO TECHNOLOGICAL UNIVERSITY

Three-dimensional seismic data waveform semi-supervised clustering method based on EM algorithm

InactiveCN104280771AAvoid lossImplementing Semi-Supervised Waveform ClassificationSeismic signal processingData setHorizon
The invention provides a three-dimensional seismic data waveform semi-supervised clustering method based on the EM algorithm. According to the three-dimensional seismic data waveform semi-supervised clustering method based on the EM algorithm, following processing is conducted on three-dimensional seismic data in a time window on a target layer; the extreme point of a three-dimensional seismic data waveform is searched, a seismic waveform is fit through the Chebyshev polynomials, and a fitting coefficient is taken as a waveform characteristic parameter; fit seismic waveforms of well byway seismic data in the three-dimensional seismic data is classified according to logging information, so that a labeled sample data set containing class information is formed; semi-supervised clustering is conducted on fit seismic waveforms which are not classified according to the EM algorithm, wherein the parameter initial value for iteration of the EM algorithm is given through the labeled sample data set containing the class information, and then clustering is conducted on the waveform characteristic parameter according to the fact that waveforms around the extreme points on the same geologic horizon are similar. According to the three-dimensional seismic data waveform semi-supervised clustering method based on the EM algorithm, logging data are adopted during clustering, classifying precision is improved, and a classification result and actual class information are closely associated.
Owner:GEOPHYSICAL EXPLORATION CO OF CNPC CHUANQING DRILLING ENG CO LTD

Flow data screening method and device

The invention discloses a flow data screening method and device. The device comprises a data acquiring module, a data processing module and a mapping and integrating module; the data acquiring module is used for acquiring heartbeat mechanism flow data of all application software depending on a probe through an analysis port and acquiring label data of the flow data; the data processing module is used for performing semi-supervised clustering processing on the flow data according to the label data and then sending the processed flow data to a data warehouse; the mapping and integrating module is used for mapping various data entities in the data warehouse into a table in a virtual data layer and performing data processing and integrating. According to the flow data screening method and device, data selection, data integrity, data cleansing and data reduction are performed on all applications based on heartbeat mechanisms, the screening validity of the heartbeat mechanism flow data of all the application software is improved, and reliable guarantee is supplied to the subsequent flow data analysis.
Owner:中国移动通信集团甘肃有限公司

Large spatial data clustering algorithm K-DBSCAN based on density

The invention particularly relates to a large spatial data clustering algorithm K-DBSCAN based on density. The algorithm comprises the steps that a density-based clustering parameter is preset: radius R, the minimum neighbor number Min_N, pre-division number K and division iteration number of times T are preset; a data set is divided into K1 subsets according to spatial distribution; the reachable subset of each data subset is calculated to form a reachable subset index; and based on the reachable subset index, spatial clustering based on density is carried out on the data of each subset. According to the technical scheme provided by the invention, density-based unsupervised and semi-supervised clustering can be carried out on the large spatial data set, and efficient and fast parallel clustering calculating is realized.
Owner:CHINA TOBACCO GUANGXI IND

Kernel function based rare category detection method fusing active learning and nonparametric semi-supervised clustering

The invention relates to a kernel function based rare category detection method fusing active learning and nonparametric semi-supervised clustering. For the problems that marked data points are not fully utilized and category related information needs to be specified in advance in a conventional rare category detection method, the invention proposes the kernel function based rare category detection method fusing the active learning and the nonparametric semi-supervised clustering. A data distribution model is optimized by utilizing small amounts of marked data and large amounts of unmarked data with the nonparametric semi-supervised clustering method, and most representative abnormal points in the unmarked data points are selected out in combination with the active learning and submitted to experts for marking, so that the workload of manual marking in a rare category detection process is reduced, the efficiency of the rare category detection process is improved, and the problem in rare category discovery under a nonlinear condition is solved.
Owner:ZHEJIANG HONGCHENG COMP SYST

Image scene classification method and system combined with semi-supervised clustering

PendingCN111753874AImprove classification accuracySolve the problem of insufficient labeled samplesMathematical modelsKernel methodsClassification methodsMachine learning
The invention discloses an image scene classification method and system combined with semi-supervised clustering, and the method comprises the steps of redefining an objective function of semi-supervised Kmeans through employing a labeled sample, and supplementing and defining an objective function of SVM, and obtaining semi-supervised Kmeans clustering and a base learning device based on SVM classification; enabling the two base learners to carry out cooperative training, and forming a selection and iterative training scheme of a pseudo label sample; and finally, according to the confidence coefficient, fusing results of the two learners to obtain a scene image category to which the sample belongs. According to the invention, different types of methods in the image scene classification field are used to construct a base classifier and carry out cooperative training. Meanwhile, a pseudo label sample is introduced to expand a training set, so that the problem of insufficient label samples is effectively solved. Furthermore, clustering is carried out on the label-free samples to obtain the distribution characteristics of the label-free samples, and the concept drift problem is solved. Finally, the labeling cost of the scene image is reduced, concept drift is solved, and the image scene classification accuracy is improved.
Owner:JIANGSU UNIV

Query expansion method based on semi-supervised clustering

The invention provides a query expansion method based on semi-supervised clustering. The query expansion method includes the steps: (1) initially retrieving user queries by a query likelihood estimation language module and returning n front documents of retrieved results; (2) manually annotating k front documents in the initial retrieved results and dividing the k front documents into a relevant document set and an irrelevant document set; (3) analyzing the n front documents by a semi-supervised clustering algorithm for constraint and distance integration and extracting the documents related to the queries as feedback documents; (4) selecting expansion words by an expansion word selection module according to the feedback documents and forming new queries by the aid of the expansion words and original queries. By learning relevancy of a small number of annotated documents and query, relevancy of a large number of unknown documents and query can be accurately estimated, the quality of the feedback documents is improved, and accordingly, the recall ratio and precision of retrieval are effectively improved.
Owner:HARBIN ENG UNIV

A user behavior analysis method based on user power consumption data

The invention discloses a user behavior analysis method based on user power consumption data. The method comprises the following steps: taking a batch user data training set of electricity consumptionper hour per day as input; pre-processing the input electricity data, including extracting the electricity characteristics, normalizing the data, and reducing the dimension by principal component analysis; taking the family characteristic information of some users as input; using the constrained seed k-means algorithm and some household information data, carrying out the semi-supervised clustering analysis on the training set of user power consumption data, and constructing the power consumption data models of different types of users; taking as input the data set of power consumption per hour per day of the user to be detected; using the model to detect the abnormal behavior of users. The method can efficiently identify and detect the user behavior according to the real-time power consumption data of the user.
Owner:NANJING UNIV OF SCI & TECH

Large-scale data clustering with dynamic social context

A system and method for dynamic, semi-supervised clustering comprises receiving data attributes, generating a set of ensemble partitions using the data attributes, forming a convex hull using the set of ensemble partitions, generating a simplex vector by performing ensemble clustering on the convex hull, receiving dynamic links, deriving an optimal simplex vector using the simplex vector and the dynamic links, computing a current optimal clustering result using the optimal simplex vector, and outputting the current optimal clustering result.
Owner:AIRBNB

Semi-supervised clustering method and semi-supervised clustering system based on nonnegative matrix factorization

The invention discloses a semi-supervised clustering method based on nonnegative matrix factorization, which comprises the steps of carrying out nonnegative matrix factorization projection on an original data matrix, and acquiring a low-dimension approximate matrix, which has both neighborhood preserving and similarity preserving, of original data; carrying out clustering on the low-dimension approximate matrix of the original data by using an algorithm receiving parameter K to acquire a clustering result; and evaluating the clustering result by using two types of evaluation standards of precision and mutual information. The semi-supervised clustering method disclosed by the invention is based on nonnegative matrix factorization, not only considers neighborhood preserving of the original data, but also considers the consistency of similarity in an original space and a low-dimension manifold subspace, so that the clustering performance is enabled to be greatly improved when prior information is great in amount, and the clustering performance can still be well preserved when the prior information is little. The invention further discloses a semi-supervised clustering system based on nonnegative matrix factorization.
Owner:ZHANGJIAGANG INST OF IND TECH SOOCHOW UNIV +1

Bank electronic channel abnormal transaction determination method based on semi-supervised learning

The invention discloses a bank electronic channel abnormal transaction determination method based on semi-supervised learning, and relates to the technical field of machine learning. The invention aims to solve problems of high difficulty, low efficiency and poor accuracy in the existing abnormal transaction determination technology. According to the method, further integration and optimization are carried out on the basis that a hidden Markov model and a time sequence model (ARIMA) establish an account-level historical transaction sequence model, and an abnormal transaction behavior is predicted by combining semi-supervised clustering learning on the basis of HMM. Transaction data of each time section are converted into a time sequence vector through semi-supervised clustering learning, and semi-supervised learning is utilized to overcome the problem that label data is rare, an HMM is utilized to fit transaction vectors of everyone to generate a corresponding model, and the semi-supervised learning and the HMM are combined to improve the accuracy of anomaly recognition from two aspects of cross section data and time sequence data. Machine learning is adopted to solve the problem of abnormal transaction determination. Compared with a traditional expert method, the difficulty is greatly reduced, and the working efficiency is improved.
Owner:HARBIN ENG UNIV

Text clustering method

The present invention discloses a text clustering method. The method comprises: finding out a pairwise constraint instance from frequent vocabularies; extracting a frequent vocabulary set from a feature word with a largest weight in each document, so as to find out a positive constraint set and a negative constraint set; expanding the constraint set according to a K nearest neighbor set; and performing clustering according to a division result of the constraint set. According to the method of the present invention, a semi-supervised clustering algorithm is added for clustering the feature word, so that dimensions of vector space are reduced, and experiment efficiency is improved, and feature word clustering becomes more reasonable and reliable with guidance of a small amount of supervision information. In addition, hierarchical collaborative clustering is used for clustering of texts and feature words, so that a clustering effect is improved.
Owner:BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products