Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

76 results about "Topic mining" patented technology

Topic mining using natural language processing techniques

The disclosed embodiments provide a method, system and apparatus for processing data. During operation, the system obtains a set of content items containing unstructured data. Next, the system obtains a set of part-of-speech (POS) tags for lexical items in the set of content items. The system then uses a computer to match the POS tags to one or more POS tagging patterns to obtain a set of candidate topics for the set of content items and extract a set of topics for the set of content items from the set of candidate topics.
Owner:MICROSOFT TECH LICENSING LLC

Short text topic model mining method based on word network to extend characteristics

A short text topic model mining method based on a word network to extend characteristics comprises a weighted word network construction step, a short text characteristics extending step, and a topic mining step. The weighted word network construction step comprises preprocessing a text, performing Chinese words segmentation on the text in a short text corpus, and deleting stop words; establishing a weighted word network from a document after the Chinese words segmentation is performed, wherein nodes in the weighted word network are words, each edge between the nodes is cooccurrence relation of two words in the same document, and the weight of the edge is the cooccurrence time of the two words in the whole corpus; and ending. The short text characteristics extending step comprises using the word nodes included by each short text after the Chinese words segmentation is performed as a community of the established weighted word network. According to the short text characteristics sparsity solution method based on word network community module degree, the problem that the effect of applying an LDA topic model to the short text is poor is solved. Accuracy of a short text topic model is increased.
Owner:NANJING UNIV

Low-rank decomposition based delicate topic mining method

The invention discloses a low-rank decomposition based delicate topic mining method. The delicate topic mining method comprises the following steps: conducting word dividing and stopword removal processing on an original corpus text; generating a topic matrix on the basis of a word frequency matrix obtained through pre-processing; decomposing the original corpus text into topic background and keywords by the topic matrix. According to the delicate topic mining method, a delicate model for expressing text contents without introducing a new implicit variable is brought forward; the model adopts an LDA (Latent Dirichlet Allocation) model as the basis to extract topic distribution of a text collection, and introduces in an improvement method of principal component analysis, namely the robustness principal component analysis method, in combination with the characteristics of text topics constituted by different aspects, in order to decompose each topic into a low-rank part and a rarefaction part; the low-rank part represents common words under the topic, and the rarefaction part is the delicate descriptions in different angles under the topic, so that the purpose of delicately expressing a text is realized, and the problems that the conventional topic model can only mine the topic background of the text, and cannot delicately describe emphasis points of the text are effectively solved.
Owner:INST OF ELECTRONICS CHINESE ACAD OF SCI

Document topic mining method and apparatus

The present application proposes a document topic mining method and apparatus. The method comprises: according to a preset topic mining number, performing loop iteration processing on information in at least one received document based on a probabilistic latent semantic analysis model, and acquiring a posteriori estimate of each topic implied by each sentence in each document; according to the posteriori estimate of each topic, acquiring a membership weight of each word in each topic in each sentence; and generating a topic set corresponding to the topic mining number, wherein each topic set comprises a word related to each topic and screened out according to the membership weight of each word in each topic in the sentence. According to the document topic mining method and apparatus provided by the present application, the document topic is more comprehensively and accurately mined based on a PLSA (Probabilistic Latent Semantic Analysis) algorithm, and the correlation of document topic content is improved, thereby enabling a result of a search engine to be closer to semantic information of the document.
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Microblog cell division method based on user comprehensive similarities

The invention designs a microblog cell division method based on user comprehensive similarities. According to the specific process of the method, 1, microblog data is acquired, LDA topic model training is performed on a blog article set, and a user topic similarity matrix is obtained through topic mining based on feature extension; 2, a network topological graph with users being nodes and user relations being edges is constructed, and a user comprehensive similarity matrix is obtained according to node link relevancy and topic similarities; and 3, a unique tag is allocated for each node first,the potential influence of each node is evaluated, then the descending order of the potential influences serves as a node selection order, the descending order of node comprehensive similarities serves as a tag update order of the nodes, and finally iterative update of the tags is performed. In this way, cell division can be performed on the microblog users through an improved tag propagation algorithm on the basis of considering the user comprehensive similarities, and the method has high application value for online public opinion monitoring, commercial user mining and the like.
Owner:JIANGSU UNIV +1

Biterm topic model (BTM) sampling acceleration method

ActiveCN106776579AOptimizing Sample Time ComplexityOptimize mining timeNatural language data processingSpecial data processing applicationsBiterm topic modelAlgorithm
The invention provides a Biterm topic model (BTM) sampling acceleration method. The method includes: establishing an alias table for each term, and selecting one Biterm topic model; sampling one new topic for the Biterm from a corpus proposal and calculating probability of acceptance; judging whether the probability of acceptance is greater than r or not, if yes, updating the Biterm, or otherwise, performing no updating; sampling another new topic for the Biterm topic model from a word proposal and calculating probability of acceptance; judging whether the probability of acceptance is greater than r or not, if yes, updating the Biterm topic model, or otherwise, performing no updating. With the method, complexity of sampling time of BTM can be optimized, convergence rate of the BTM can be greatly increased, quality of final topic clustering is unaffected, time for essay topic mining can be optimized, and meanwhile, time for text topic mining can be optimized as well.
Owner:TSINGHUA UNIV

User text information analysis method and device

The invention provides a user text information analysis method. The method includes: processing to-be-analyzed text information; carrying out potential topic mining on the preprocessed to-be-analyzedtext information, and obtaining topic probability distribution of the text; calculating the similarity of the text according to the topic probability distribution, and performing user characteristic value clustering according to the similarity; performing digital marking on the clustered to-be-analyzed text information to obtain to-be-analyzed sample data; and inputting the to-be-analyzed sample data into a pre-established user preference analysis model to obtain a user preference analysis result. According to the scheme, the text similarity between the users is calculated by deeply mining thetext features of the users, and clustering analysis is performed according to the similarity distance, so that the structure of a hidden layer of the deep neural network is simplified, and the learning efficiency of the deep neural network is improved.
Owner:BEIJING INFORMATION SCI & TECH UNIV

Veterinary drug residue knowledge graph construction method based on weighted LDA

The invention discloses a veterinary drug residue knowledge graph construction method based on weighted LDA (Latent Dirichlet Allocation). The method comprises the following steps: firstly, constructing a veterinary drug knowledge framework, and performing deep search and downloading literature by using a web crawler in combination with the knowledge framework; and aiming at topic noise existing in the LDA topic model and a feature word bias problem, performing topic mining by using a weighted LDA method, and downloading veterinary drug related literatures again; completing named entity identification and relationship extraction by using a dictionary-based model; and finally, utilizing the Neo4j graph database to construct a veterinary drug knowledge graph. The veterinary drug residue knowledge graph can be constructed, veterinary drug residue characteristic rules and causes of damage of veterinary drug residues to human bodies can be found out, the quality safety of meat, eggs and milk is guaranteed, and therefore the body health and life safety of people are protected.
Owner:CHINA AGRI UNIV

Scenic spot evaluation knowledge base construction method based on metaphor topic mining

The invention discloses a scenic spot evaluation knowledge base construction method based on metaphor topic mining. The method comprises the steps of using a scenic spot recessive topic mining algorithm to construct a scenic spot recessive multi-topic knowledge base; S2, constructing a metaphor multi-topic knowledge base of the scenic spot by adopting a scenic spot metaphor topic feature mining algorithm; S3, constructing a scenic spot evaluation knowledge base based on the semantic matching calculation model of the scenic spot corpus, and identifying the theme to which the tourist comment data belongs and the emotional tendency corresponding to the theme based on the scenic spot evaluation knowledge base. According to the invention, the scenic spot evaluation knowledge base considering metaphor information is constructed; according to the technical scheme, the fine-grained theme of each comment and the emotional tendency information of the corresponding theme in the internet tourism website can be accurately judged, data support is provided for tourists, the tourists are assisted in making decisions conforming to the preferences of the tourists, scenic spot managers can be assisted in improving scenic spot services, and the network public praise of the scenic spots is improved.
Owner:INST OF REMOTE SENSING & DIGITAL EARTH CHINESE ACADEMY OF SCI

Urban functional area identification process based on space-time semantic mining

The invention discloses an urban functional area identification process based on space-time semantic mining, which comprises documents, words, basic functional units, space-time data, a topic model, document topic distribution and unit function distribution, and is characterized in that firstly, hidden functions of an area are tried to discover through the topic model, compared with a text theme mining, the basic function units are equivalent to the documents in a corpus, space-time data in the basic function unit is similar to words in the document, unit function distribution obtained after passing through the topic model is equivalent to document topic distribution, and the used city space-time data is representative Sina microblog position sign-in data. Each piece of sign-in data comprises user information, space coordinates of sign-in positions, publishing time, publishing texts and the like. Dynamic activity modes of people can be reflected from different angles, meanwhile, POIs in a research area are obtained from a Baidu map, and function recognition of the area is achieved.
Owner:武汉市中城事大数据有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products