Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

254 results about "Text cluster" patented technology

Short text clustering method based on deep semantic feature learning

The invention discloses a short text clustering method based on deep semantic feature learning. The method includes the steps that dimensionality reduction representation is performed on original features under the restraint of local information preservation through traditional feature dimensionality reduction, binarization is performed on an obtained low-dimension actual value vector, and error back propagation is performed with the binarized vector being supervisory information of a convolutional neural network structure to train a model; non-supervision training is performed on a term vector through an outer large-scale corpus, vectorization representation is performed on all words in text according to the word order, and the vectorized words serve as implicit semantic features of initial input feature learning text of the convolutional neural network structure; after deep semantic feature representation is obtained, a traditional K-means algorithm is adopted for performing clustering on the text. By means of the method, extra natural language processing and other specialized knowledge are not needed, design is easy, deep semantic features can be learnt, besides, the learnt semantic features have unbiasedness, and good clustering performance can be achieved more effectively.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Method for constructing public opinion knowledge map based on hot events

The present invention discloses a method for constructing a public opinion knowledge map based on hot events, and belongs to the field of natural language processing. The method comprises: obtaining microblogging texts in real time, processing each microblogging text, constructing text clusters, calculating a topic category to which each text cluster belongs, identifying hot events in each clusterby category, and collecting statistics of multi-dimensional attributes of each hot event; identifying key people and organizations involved in the discussion of the hot events and obtaining the multi-dimensional attributes of the key people and organizations; and constructing a multi-dimensional attribute system and a relationship type among events, people and organizations, taking the relationship among the events, people and organizations as association, and constructing a public opinion knowledge map. According to the method disclosed by the present invention, the hot events, people and organizations can be described from multiple dimensions, and all-directional analysis of hot events, people and organizations can be implemented; and according to the actual needs, the weight of different topic categories can be set, and construction of the public opinion knowledge map of different topics can be realized.
Owner:NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT

Chinese short text clustering method

The invention relates to a Chinese short text clustering method, and in particular relates to a Chinese short text clustering method based on word vectors and similarity calculation thereof. The Chinese short text clustering method comprises the following specific steps of: obtaining needed word vectors by utilizing a Word2Vec word vector training model; obtaining weights of all words in a short text set by utilizing a word weight calculation algorithm; according to the word vectors and the weights of all the words, calculating the similarity value between every two texts in the short text set through a short text similarity algorithm; and, according to the similarity value between every two texts in the short text set, clustering short texts. The invention provides a carrying optimization type short text similarity calculation method; the problems of sparse short text grammar characteristics, semantic loss and the like can be solved; on the basis of a graph model, the weights of the words are continuously calculated iteratively, so that the sentence similarity calculation accuracy is increased; and, a density peak clustering method is applied in short text clustering, so that the efficiency of the clustering method is effectively increased.
Owner:FOCUS TECH +1

Chinese electronic case text analysis method and system

The invention provides a Chinese electronic case text analysis method and system. The method includes following steps: obtaining a case data set by employing case texts; separating value variables andtext information through regularization processing, determining value information meaning according to context description, and searching and structurally saving time information of different categories by employing a regular expression; performing word segmentation and part-of-speech tagging on the texts by employing natural language processing, performing further screening with the combinationof medical entity identification, and determining positions and types of medical key vocabularies in the texts; analyzing and screening the medical key vocabularies and information; simulating cases by employing the above related results; converting content of the texts to value vectors; adding similarity tags to the case texts; learning marks of the case texts; and screening the cases with the similarity on new case samples from the case data set according to the marks and training results. According to the method and system, for different evaluation standards, different similar case text clusters can be found from existed case texts for each new case text.
Owner:TSINGHUA UNIV

Clustering method and system of parallelized self-organizing mapping neural network based on graphic processing unit

The invention relates to a clustering method and system of a parallelized self-organizing mapping neural network based on a graphic processing unit. Compared with the traditional serialized clustering method, the invention can realize large-scale data clustering in a faster manner by parallelization of an algorithm and a parallel processing system of the graphic processing unit. The invention mainly relates to two aspects of contents: (1) firstly, designing the clustering method of the parallelized self-organizing mapping neural network according to the characteristic of high parallelized calculating capability of the graphic processing unit, wherein the method comprises the following steps of obtaining a word-frequency matrix by carrying out parallelized statistics on the word frequency of keywords in a document, calculating feature vectors of a text by parallelization to generate a feature matrix of data sets, and obtaining a cluster structure of massive data objects by the parallelized self-organizing mapping neural network; and (2) secondly, designing a parallelized text clustering system based on a CPU / GPU cooperation framework by utilizing the complementarity of the calculating capability between the graphic processing unit (GPU) and the central processing unit (CPU).
Owner:HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL

Short text clustering method based on deep learning

The invention discloses a short text clustering method based on deep learning. The method includes the following steps that S101, the semantic similarity between short texts is obtained through calculation through a convolutional neural network; S102, the semantic similarity is applied to a clustering algorithm, and the short texts are clustered. The existing short text clustering accuracy is improved, then mass short text data can be more rapidly and accurately subjected to clustering analysis, the method can be widely applied to the fields such as short text clustering tasks, sentiment analysis and recommendation systems, short text similarity parts and short text clustering parts are calculated through the convolutional neural network without preprocessing input short text data, and the length of the input short texts can be increased.
Owner:RUN TECH CO LTD

Online clustering visualization method of text

The invention provides an online clustering visualization method of a text and belongs to the field of intelligent information processing of computer science. The method aims to introduce type characteristic word marking information to a user to realize the restriction and the optimization on a clustering process and improve the definition and the intelligibility of a text clustering structure; and an online clustering technology of the test is designed to realize increment clustering on a text data flow, keep the stability of the whole body of the clustering structure and update a model in a self-adaptive manner. The invention designs an online type high-dimensional data dimension-reducing and arrangement method to be suitable for large-scale data or a data flow environment; and the dimension reduction and the arrangement are carried out on a clustered text type distribution vector, so as to realize the increment visualization of text data and realize the visualized display of the text data and the type structure in a two-dimensional or three-dimensional Euclidean space.
Owner:中国人民解放军总参谋部第五十七研究所

Text clustering multi-document automatic abstracting method and system for improving word vector model

The invention discloses a text clustering multi-document automatic abstracting method and a system for improving a word vector model. The CBOW of the Hierachic Softmax belongs to the field of large-scale model training, and the CBOW of the Hierachic Softmax belongs to the field of large-scale model training. Based on the method, a TesorFlow deep learning framework is introduced into word vector model training; the problem of time efficiency of a large-scale training set is solved through streaming processing calculation, TF-IDF is introduced firstly during sentence vector representation, thenthe semantic similarity of a semantic unit to be extracted is calculated, weighting parameters are set for comprehensive consideration, and a semantic weighted sentence vector is generated; beneficialeffects are as follows. The advantages and disadvantages of semantics, deep learning and machine learning are comprehensively considered; density clustering and convolutional neural network algorithms are applied. Intelligent degree is high, according to the method, the statement with high relevancy with the central content can be quickly extracted to serve as the abstract of the text, various machine learning algorithms are applied to the automatic text abstract to achieve a better abstract effect, the method is possibly the main research direction in future in the field, and in addition, the system according to the invention supplies a tool for automatic extraction of a document abstract based on the method.
Owner:上海晏鼠计算机技术股份有限公司

Dialogue short text clustering method based on form and semantic similarity

The invention discloses a dialogue short text clustering method based on form and semantic similarity. The form similarity adopts character string editing distance similarity, and the semantic similarity is based on HowNet and WordNet knowledge bases; weight values of the short text and words are introduced during the calculation of the short text similarity. The dialogue short text clustering method based on the form and semantic similarity solves the problems of certain irregular and input wrong noise information, synonyms and semantic gaps included in the dialogue short text to a certain extent, and consequently, relatively great improvement is realized in comparison with a word bag vector based clustering method.
Owner:EAST CHINA NORMAL UNIV

Text clustering method, electronic device and storage medium

The invention discloses a text clustering method. The method comprises the steps of receiving a text clustering instruction sent by a user; pre-training a pre-determined initial language model by utilizing the to-be-clustered corpus to obtain a target language model; sequentially inputting each text in the to-be-clustered corpus into the target language model for feature extraction, obtaining a sentence vector of each text in the to-be-clustered corpus according to a model output result, and generating a to-be-clustered sentence vector set; and, by utilizing a preset clustering algorithm, clustering the to-be-clustered corpora based on the to-be-clustered sentence vector set to obtain sentence vectors corresponding to each category, and determining a clustering result of the to-be-clustered corpora. The invention further discloses an electronic device and a computer storage medium. By utilizing the method and the device, the text clustering accuracy and efficiency can be improved.
Owner:招商局金融科技有限公司

Relationship linking method based on knowledge map

The invention relates to a relationship linking method based on a knowledge map. The method comprises the steps that firstly, a ternary group < subject, relation, object > list containing a certain relation is found using a SparQL query statement from a knowledge mapping domain, and a relation text is matched from an unstructured text; a similarity matrix of the relation text is obtained by using an LSWMD algorithm, then clustering is conducted on the relation text by using a density peak clustering algorithm, and a relation text cluster is obtained; the position of all the words in the cluster is extracted based on the relation text cluster, fitting is conducted using the beta distribution, and a word distribution mode of the relation text cluster is obtained; for the candidate relation text of unestablished relation in the unstructured text of an open domain, the vector is constructed using the word distribution mode, a GBDT classifier is used for carrying out the identification, and linking with the knowledge mapping domain is achieved. According to the relationship linking method based on the knowledge map, the problem of insufficient link between a natural language and the knowledge map is effectively solved, and it is helpful for the computer to understand the natural language better.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Method and apparatus for selecting text classification training sets

The invention discloses a method and an apparatus for selecting text classification training sets, relates to the technical field of computers, and solves the problems of low speed, large error and low efficiency of an existing text training set classification mode. According to the main technical scheme, the method comprises the steps of performing similarity clustering on texts in training sets according to a predetermined clustering algorithm by utilizing cosine similarity to obtain a plurality of text clusters; extracting a representative text from each text cluster, wherein the representative text and other texts in the cluster in which the representative text is located have common similar features; determining a text classification tag of the representative text according to a predetermined keyword; and adding all texts in the text cluster, in which the representative text is located, to a text training set corresponding to the text classification tag. The method and the apparatus are mainly used for classification selection of the text training sets.
Owner:BEIJING GRIDSUM TECH CO LTD

Method and device for processing data and knowledge graph

Provided are a method and device for processing data and a knowledge graph. The method comprises the steps that one or more first-level key words are screened out of words corresponding to all domain corpus data through key word screening operation, and each piece of domain corpus data is matched to the first-level key words; one or more second-level key words are screened out of domain corpus data corresponding to each first-level key word through key word screening operation, and the domain corpus data is matched to the second-level key words; the screening and matching processes are constantly repeated till the M-level key words are screened out, and the domain corpus data is matched to the M-level key words, wherein M is larger than or equal to 2; the domain corpus data corresponding to the M-level key words is subjected to text clustering according to semantics. The method is used for processing the data, and the processed data can be searched for content needed by a user faster and more accurately.
Owner:SHANGHAI XIAOI ROBOT TECH CO LTD

A short text clustering method based on weighted word vector representation and combinatorial similarity

The invention discloses a short text clustering method of weighted word vector representation and combination similarity. The method is: perfoorming short text preprocessing. Constructing a weighted word vector representation of short text. Calculating The Euclidean distance similarity and corotation similarity between short texts, and constructing the combination similarity matrix. Constructing alow-dimensional vector representation of short text. Finally, applying K-means applied to achieve more accurate short text clustering.
Owner:上海文军信息技术有限公司

Iteration text clustering method based on self-adaptation subspace study

The invention discloses an iteration text clustering method based on self-adaptation subspace study. The method includes the following steps: (1) initiation: text linguistic data is expressed as a text vector space, initial K clusters are generated through an affine propagation clustering method, and all text clustering categories are expressed as an initial category affiliation indication matrix; and (2) iteration between the subspace projection and the clusters: the initial category affiliation indication matrix is used as prior knowledge, a maximum average neighborhood edge is used as a target to solve a subspace projection matrix, the text vector space is projected to a subspace, K clusters are generated through the affine propagation clustering method in the subspace, and a category affiliation indication matrix is updated; and a convergent function is calculated based on the subspace projection matrix and the category affiliation indication matrix till the function is converged, iteration exits, and text clustering is finished. The iteration text clustering method does not limit the capacity and distribution of text data, subspace solution and clusters are fused under a uniform frame, and an overall optimal clustering result is obtained through an iteration strategy.
Owner:广东南方报业传媒集团新媒体有限公司

Class center compression transformation-based text clustering method in search engine

The invention discloses a class center compression transformation-based text clustering method in a search engine. The method comprises the following steps of: by using an improved tf-idf formula, calculating word weight of each file in a text set, calculating an initial class center, mining a synonym word set and a concurrent high-frequency word set, calculating a word center and performing primary classification according to similarity of the initial class center with each file; compressing the center word according to information such as title word, article length, synonyms and concurrent associated words, thereby guaranteeing that the same word only occurs in some class centers with high similarity with the word; clustering the file by using a new cluster center again; calculating core similarity of each class; splitting the biggest class; combining smaller classes to produce a new class; iterating compression, clustering and split operation until the number of the classes converges; and guaranteeing that the similarity of the text in the same class with the cluster center reaches a certain threshold value. The clustering accuracy is obviously higher than those of the conventional methods such as KMeans and DBSCAN (Density-based Spatial Clustering of Applications with Noise).
Owner:珠海市颢腾智胜科技有限公司

Text clustering method on basis of automatic threshold fish swarm algorithm

The invention discloses a text clustering method on the basis of an automatic threshold fish swarm algorithm. The text clustering method includes computing a similarity matrix of feature vectors of texts, acquiring an initial equivalent partitioning threshold of each text by a corresponding row of elements of the similarity matrix, performing initial equivalent partitioning for the texts and determining an initial clustering number and an initial clustering center; and adopting the artificial fish swarm algorithm in a combination manner, updating the state of each artificial fish according to global optimal information and local optimal information, searching a global optimal clustering center and clustering initial clustering results again. The text clustering method has the advantages that the initial clustering number and the initial clustering center are acquired by a process for automatically acquiring the thresholds, the global optimal clustering center is searched by the aid of the artificial fish swarm algorithm, accordingly, shortcomings that the traditional clustering method is sensitive to initial values and only relies on local data characteristics and the like are overcome, and the text clustering accuracy and the text clustering intelligence can be improved.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Method and device for calculating text similarity and realizing search processing through computer

The invention provides a method and a device for calculating text similarity and realizing search processing achieved through a computer. The method comprises the following steps: acquiring a first text cluster and a second cluster; decoding the first text cluster according to a preset phrase translation model and a dependency structure model to obtain K translation text clusters; respectively calculating a first semantic similarity value between the K translation text clusters and the second text cluster, and calculating a second semantic similarity value between the first text cluster and the second text cluster according to the K calculated semantic similarity values. By adopting the method and the device, the problem of long distance dependency relationship in sentences is solved, the semantics of searched sentences can be relatively well expressed, the searched sentences can be relatively well matched with webpage titles, and a user can obtain semantic matching search result items, so that the search experience of the user is improved.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Text clustering method and system

The invention relates to a text clustering method and system. The text clustering method comprises the following steps: keywords of to-be-classified texts are extracted when the to-be-classified texts are received; the keywords of the to-be-classified texts are matched according to the obtained keywords in a final word bag, and the type tag of the to-be-classified text is obtained; the final word bag is obtained by sorting and screening the key words in various type tag word bags according to preset selection rules; the type tag word bags are sets of key words generated after key word extraction from texts corresponding to type tags. The key words corresponding to each tag are extracted through records of existing tags, the final word bag is obtained, to-be-classified texts are classified according to the key words in the final word bag, good adaptability to noise data is realized, and the condition that the accuracy is reduced substantially under the condition of more noise is avoided; an approximate string matching effect is improved greatly through large-range thresholding of a centroid.
Owner:GUANGZHOU SHIYUAN ELECTRONICS CO LTD

System and method for machine-assisted segmentation of video collections

According to various embodiments, a system for accessing video content is disclosed. The system includes one or processors on a video hosting platform for hosting the video content, where the processors are configured to generate an automated transcription of the video content and apply text clustering modules based on a trained neural network to segment the video content.
Owner:THE TRUSTEES FOR PRINCETON UNIV

Text clustering method based on weak supervised deep learning

The invention discloses a text clustering method based on weak supervised deep learning. The method comprises the following steps: (1) by means of an image data set with text click information, imagevisual information and image category labels are utilized, and adopting image amplification and clustering to construct an image category click characteristic matrix of each text; And (2) obtaining asmooth image click feature map on the initial class click matrix by using a sorting and propagation method. Performing text clustering on the feature map to obtain an initial text category, and initializing text weight by utilizing click priori; (3) under the condition of minimizing an intra-class mean square error, building a deep text clustering model to learn deep text characteristics; (4) performing joint optimization on the depth model and the text weight by using a weak supervised learning method, and iteratively updating the depth model and the text weight; (5) deep text features are extracted through the deep text model, and K-based text feature extraction is achieved. And clustering the means method. The method has very high universality, and the semantic gap in image recognitionis effectively solved.
Owner:HANGZHOU DIANZI UNIV

Short text clustering-based labeling system and method

The invention relates to a short text clustering-based labeling system and method, belongs to the technical field of clinical medical labeling, and solves the problems of low labeling efficiency, difficult training, poor result accuracy and excessively high communication cost in the prior art. The short text clustering-based labeling system comprises an input module, a text clustering algorithm module, a multi-text alignment module, a result display module, a quick labeling module and an output module. Compared with the prior art, the labeling system and method has the advantages that a text clustering algorithm and a multi-text alignment algorithm are adopted, so that the reading quantity of similar sub-texts is greatly reduced and the reading speed is increased; longitudinal multi-text comparative browsing is adopted, so that the great convenience is provided for a user to perform manual comparison; and furthermore, algorithm training can be performed without any training set, and ITpersonnel do not need to perform algorithm modification for different medical texts, so that the communication cost is extremely low.
Owner:思派(北京)网络科技有限公司

Fraud call detection method and device

The invention discloses a fraud call detection method. The method comprises the following steps that all call voices are converted into texts, thereby forming a text set; each text in the text set isconverted into a key word weight vector; a plurality of clusters are formed by performing text clustering on all key word weight vectors, and whether each cluster is a fraud cluster or not is determined according to a fraud keyword set; calls corresponding to all the keyword weight vectors in the fraud clusters are determined as fraud calls; a text social network is constructed by utilizing all the calls and keywords, nodes corresponding to the fraud calls are marked as the fraud calls in the text social network, and other nodes marked as the fraud calls are determined through label propagation; and the calls corresponding to all the nodes marked as the fraud calls are determined as the fraud calls. The method can be applied to various fraud types, and meanwhile, user sensitive data does not need to be acquired, so that the operability is higher.
Owner:BEIJING UNIV OF POSTS & TELECOMM +1

Semi-supervised text clustering method and device fusing pairwise constraints and keywords

The invention discloses a semi-supervised text clustering method and device fusing pairwise constraints and keywords. The method comprises the following steps of: fusing pairwise constraints to assist in text clustering to obtain an initial feature word weight; fusing the pairwise constraints and keywords and performing the semi-supervised clustering at the same time based on the obtained initial feature word weight; and evaluating and selecting a clustering result according to a user satisfaction degree. The device provided by the invention comprises a pre-processing module, a text clustering module fusing pairwise constraints, a semi-supervised text clustering module fusing pairwise constraints and the keywords, and an evaluation and selecting result module. Since the semi-supervised text clustering method provided by the invention continuously adds keyword information on the basis of fusing pairwise constraint information, the keyword information is used for adjusting the corresponding feature word weight while applying the pairwise constraints to learning the feature word weight; and therefore, the two prior information can be mutually influenced and promoted to obtain a more accurate clustering result.
Owner:QINGDAO TECHNOLOGICAL UNIVERSITY

Spark-based multi-feature combined efficient Chinese text clustering method

The invention discloses a Spark-based multi-feature combined efficient Chinese text clustering method. the method comprises the following steps of: uploading mass data sets into an HDFS file system byutilizing high fault tolerance and high data access throughput of the HDFS file system, carrying out data preprocessing and submitting the data sets to a Spark cluster; and after text set preprocessing is completed, respectively calculating a semantic similarity and a word frequency statistics-based cosine similarity of a dimensionality-reduced text, combining the two similarities to obtain a final text similarity, and carrying out text clustering by utilizing the obtained text similarity and combining a maximum distance method. According to the method, semantic information and word frequencystatistics information are combined to ensure that the text similarity calculation is more correct and the number of iterations is greatly decreased at the same time.
Owner:NANJING UNIV OF SCI & TECH

Scalable and accurate mining of control flow from execution logs across distributed systems

Methods and arrangements for efficiently mining a control flow graph from execution logs of a distributed system. Using at least one text clustering technique, two text clusters are generated from the plurality of execution logs. At least one approximate template is generated based on the at least two text clusters. At least one refined template is created via refining the at least one approximate template using multimodal sequencing. The control flow graph is created based on the at least one refined template. An anomaly is detected in the control flow graph.
Owner:IBM CORP

Short text clustering analysis method, device and terminal device

The invention is applicable to the technical field of text analysis, and provides a short text clustering analysis method, a device and a terminal device. The method comprises the following steps: acquiring a short text data set to be clustered, and preprocessing the short text data set to obtain an initial word set including at least three parts of speech; The initial word set is extracted to obtain a feature word set including a topic feature word set and a topic related word set. The preset number of subject feature words and subject related words are determined according to the relevance of subject feature words and subject related words. The subject feature words and subject related words correspond one by one to form knowledge pairs. The preset number of knowledge pairs is input intothe LDA for clustering and the emotional theme of the short text data set to be clustered is determined. The invention optimizes the text analysis algorithm, can more accurately carry out the emotional theme clustering of the short text, and improves the efficiency of the short text clustering.
Owner:HEBEI UNIV OF ENG

Characteristic quantification method of graininess-variable text cluster

InactiveCN101436201AImprove semantic sensitivityMeet the requirements for observing text informationSpecial data processing applicationsGranularityCluster systems
The invention provides a variable granularity text clustering characteristic quantification method, which is realized by the following steps: firstly, concept expansion of keywords of a file, namely a keyword set in the file is expanded into a concept word set with higher semantic covering capacity by utilization of a knowledge network; secondly, calculation of characteristic representation and similarity, namely the similarity between words can be comprehended as the overlap ratio of common characteristics, and the similarity between files which apply text clustering can also be judged by examining the number of the common characteristics between the files; and thirdly, achievement of the effect of variable granularity clustering through combined use of the variable granularity text clustering characteristic quantification technology and detailed clustering algorithms. The variable granularity text clustering characteristic quantification method overcomes the defect of poor clustering effect under the condition of variable granularity clustering due to inappropriate characteristic quantification of the prior file clustering system.
Owner:HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products